PASA的安装,配置与主程序使用参数

1. PASA简介

PASA, acronym for Program to Assemble Spliced Alignments, is a eukaryotic genome annotation tool that exploits spliced alignments of expressed transcript sequences to automatically model gene structures, and to maintain gene structure annotation consistent with the most recently available experimental sequence data. PASA also identifies and classifies all splicing variations supported by the transcript alignments.

Note:
Combine genome and Trinity de novo RNA-Seq assemblies to generate a comprehensive transcript database.

2. PASA使用前的准备

2.1 Mysql数据库的准备

创建只读权限用户和所有权限用户各一个。

mysql> GRANT SELECT ON *.* TO 'pasa'@'%' IDENTIFIED BY '123456';
mysql> GRANT ALL ON *.* TO 'chenlianfu'@'%' IDENTIFIED BY '123456';
mysql> FLUSH PRIVILEGES;

2.1 安装perl模块

# cpan
cpan[1]> install DBD::mysql
cpan[1]> install GD

2.3 安装GMAP

$ wget http://research-pub.gene.com/gmap/src/gmap-gsnap-2013-03-31.v5.tar.gz
$ tar zxvf gmap-gsnap-2013-03-31.v5.tar.gz
$ cd gmap-2013-03-31
$ ./configure --prefix=$PWD
$ make -j 8
$ make install

2.4 安装BLAT

$ wget http://hgwdev.cse.ucsc.edu/~kent/src/blatSrc35.zip
$ unzip blatSrc35.zip
$ cd blatSrc
$ MACHTYP=x86_64
$ export MACHTYPE
$ mkdir -p ~/bin/x86_64
$ make -j 8

2.5 安装FASTA

$ wget http://faculty.virginia.edu/wrpearson/fasta/fasta3/CURRENT.tar.gz
$ tar zxvf CURRENT.tar.gz
$ cd fasta-35.4.12
$ cd src
$ make -f ../make/Makefile.linux_sse2 all
$ cd ..
$ ln -s $PWD/bin/fasta35 ~/bin/fasta

2.6 安装PASA

$ wget http://kaz.dl.sourceforge.net/project/pasa/PASA2-r20130425beta.tgz
$ tar zxvf PASA2-r20130425beta.tgz
$ cd PASA2-r20130425beta/
$ make -j 8

2.7 安装GD

安装GD需要先行安装libgd

$ wget https://bitbucket.org/libgd/gd-libgd/get/93368566388c.zip
$ unzip 93368566388c.zip
$ cd libgd-gd-libgd-93368566388c
$ ./bootstrap.sh
$ ./configure
$ make -j 8
$ sudo make install
$ gdlib-config

再安装GD

$ wget http://search.cpan.org/CPAN/authors/id/L/LD/LDS/GD-2.49.tar.gz
$ tar zxvf GD-2.49.tar.gz
$ cd GD-2.49
$ perl Makefile.PL
$ make -j 8
$ sudo make install

安装GD的目的是能通过网页来查看PASA的运行结果。

2.8 配置PASA

2.8.1. 修改PASA的配置文件$PASAHOME/pasa_conf/conf.txt

$ cp $PASAHOME/pasa_conf/pasa.CONFIG.template $PASAHOME/pasa_conf/conf.txt
$ vim $PASAHOME/pasa_conf/conf.txt

2.8.2. 该文件需要修改的地方:

PASA_ADMIN_EMAIL=(your email address)
MYSQLSERVER=(your mysql server name)   此处不能填写IP。
MYSQL_RO_USER=(mysql read-only username)
MYSQL_RO_PASSWORD=(mysql read-only password)
MYSQL_RW_USER=(mysql all privileges username)
MYSQL_RW_PASSWORD=(mysql all privileges password)
BASE_PASA_URL=http://server_name/pasa/cgi-bin/

2.8.3. 修改httpd配置文件,

# vim /etc/httpd/conf/httpd.conf
# /etc/init.d/httpd restart

在/etc/httpd/conf/httpd.conf添加如下几行:

ScriptAlias /pasa "$PASAHOME"
<Directory "$PASAHOME">
        Options MultiViews ExecCGI
        AllowOverride None
        Order allow,deny
        Allow from all
</Directory>

2.9 cleaning the transcript sequences[Optional, requires seqclean to be installed

下载两个污染数据库,为fasta文件。

$ cd $PASAHOME/seqclean
$ tar zxf seqclean.tar.gz
$ cd seqclean
$ wget ftp://ftp.ncbi.nih.gov/pub/UniVec/UniVec -O UniVec.fasta
$ wget ftp://ftp.ncbi.nih.gov/pub/UniVec/UniVec_Core -O UniVec_Core.fasta

UniVec_Core includes only oligonucleotides and vectors consisting of bacterial, phage, viral, yeast or synthetic sequences. Vectors that include sequences of mammalian origin are excluded.

3. PASA主程序的使用

PASA的主程序是: $PASAHOME/scripts/Launch_PASA_pipeline.pl, 其使用参数如下:

*代表该参数是必须的

-c <filename> *
比对配置文件。可以将$PASAHOME/pasa_conf/pasa.alignAssembly.Template.
txt复制过来,只是将其中的MYSQLDB修改成需要的mysql数据库名。

####################

spliced alignment settings:
--ALIGNERS <string>
比对的软件,可用的软件有gmap和blat。也可以同时选择使用'gmap,blat'

-N <int> default: 1
max number of top scoring alignments

--MAX_INTRON_LENGTH | -I <int>  default: 100000
max intron length parameter passed to GMAP or BLAT

--IMPORT_CUSTOM_ALIGNMENTS_GFF3 <filename>
only using the alignments supplied in the corresponding GFF3 file.

--cufflinks_gtf <filename>
incorporate cufflinks-generated transcripts

####################

actions
-C
    flag, create MYSQL database
-R
    flag, run alignment/assembly pipeline.
-A
    compare to annotated genes.
--ALT_SPLICE
    flag, run alternative splicing analysis

-R 用于比对transcripts , -A 用于和已有gff3注释文件的比较和更新;这两个参数不
能同时共用,使用不同的参数,则 -C 参数设置不同的参数文件。

####################

input files

-g <filename> *
    genome sequence FASTA file

-t <filename> *
    transcript db

-f <filename>
    file containing a list of fl-cdna accessions.

--TDN <filename>
    file containing a list of accessions corresponding to Trinity
 (full) de novo assemblies (not genome-guided)

####################

polyAdenylation site identification  ** highly recommended **
-T
    flag,transcript db were trimmed using the TGI seqclean tool.
-u <filename>
    value, transcript db containing untrimmed sequences (input to 
seqclean).a filename with a .cln extension should also exist, gen
erated by seqclean.

####################

Jump-starting or prematurely terminating
-x
    flag, print cmds only, don't process anything. (useful to get 
indices for -x or -e opts below)
-s <int>
    pipeline index to start running at (avoid rerunning searches).
-e <int>
    pipeline index where to stop running, and do not execute this 
entry. 

####################

Misc:
--TRANSDECODER
    flag, run transdecoder to identify candidate full-length coding
 transcripts
--CPU <int> default: 2
    multithreading
-d  flag, Debug 
-h  flag, print this option menu and quit

7 thoughts on “PASA的安装,配置与主程序使用参数

  1. Pingback引用通告: PASA的使用 | 宠辱不惊,一心问学!

  2. Just wanted to say thank you! i have been looking everywhere for some clear pasa installation instructions and now i finally found it in chinese. thank you!!!!!!!!!!

  3. MYSQLSERVER=(your mysql server name)
    BASE_PASA_URL=http://server_name/pasa/cgi-bin/
    陈老师您好!我对这两处mysql server name不是很了解,不知道该怎么填,柑橘下面网址的话server_name似乎是服务器IP地址,但是上面您有备注此处不能填I地址,那这个是不是服务器名字呢?

发表评论

电子邮件地址不会被公开。 必填项已用*标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据