1.PBJelly2 简介
PBJelly2 用于利用 Pacbio 数据进行基因组补洞和 scaffold 连接。
2.安装 PBJelly2
安装 HDF: $ wget http://www.hdfgroup.org/ftp/HDF5/current/bin/linux-centos6-x86_64/hdf5-1.8.15-patch1-linux-centos6-x86_64-shared.tar.gz $ tar zxf hdf5-1.8.15-patch1-linux-centos6-x86_64-shared.tar.gz $ mv hdf5-1.8.15-patch1-linux-centos6-x86_64-shared /opt/biosoft/hdf5-1.8.15-patch1 $ export HDF5INCLUDEDIR=/opt/biosoft/hdf5-1.8.15-patch1/include/ $ export HDF5LIBDIR=/opt/biosoft/hdf5-1.8.15-patch1/lib/ $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/biosoft/hdf5-1.8.15-patch1/lib/ $ export C_INCLUDE_PATH=$C_INCLUDE_PATH:/opt/biosoft/hdf5-1.8.15-patch1/include/ 安装 Blasr(需求 HDF v1.8.0或以上版本): $ wget https://github.com/PacificBiosciences/blasr/archive/master.zip -O blasr.zip $ unzip blasr.zip $ mv blasr-master/ /opt/biosoft/blasr $ cd /opt/biosoft/blasr $ make -j 8 $ echo 'PATH=$PATH:/opt/biosoft/blasr/alignment/bin/' >> ~/.bashrc 安装Python模块 Networkx v1.7 $ wget https://pypi.python.org/packages/source/n/networkx/networkx-1.7.tar.gz $ tar zxf networkx-1.7.tar.gz $ cd networkx-1.7 $ sudo /usr/local/bin/python setup.py install 安装Python模块 pyparsing $ wget https://pypi.python.org/packages/source/p/pyparsing/pyparsing-2.0.3.tar.gz $ tar zxf pyparsing-2.0.3.tar.gz $ cd pyparsing-2.0.3 $ sudo /usr/local/bin/python setup.py install 安装Python模块 numpy $ wget https://pypi.python.org/packages/source/n/numpy/numpy-1.9.2.tar.gz $ tar zxf numpy-1.9.2.tar.gz $ cd numpy-1.9.2 $ sudo /usr/local/bin/python setup.py install 安装Python模块 h5py $ wget https://pypi.python.org/packages/source/h/h5py/h5py-2.5.0.tar.gz $ cd h5py-2.5.0 $ export LIBRARY_PATH=/opt/biosoft/hdf5-1.8.15-patch1/lib/:$LIBRARY_PATH $ /usr/local/bin/python setup.py build $ sudo /usr/local/bin/python setup.py install 安装Python模块 pysam $ https://pypi.python.org/packages/source/p/pysam/pysam-0.8.3.tar.gz $ tar zxf pysam-0.8.3.tar.gz $ cd pysam-0.8.3 $ sudo /usr/local/bin/python setup.py install 安装Python模块 intervaltree $ wget https://pypi.python.org/packages/source/i/intervaltree/intervaltree-2.0.4.tar.gz $ tar zxf intervaltree-2.0.4.tar.gz $ cd intervaltree-2.0.4 $ sudo /usr/local/bin/python setup.py install 安装 PBJelly2 $ wget http://sourceforge.net/projects/pb-jelly/files/latest/download -O PBSuite.tar.gz $ tar zxf PBSuite.tar.gz -C /opt/biosoft/ $ SWEETPATH=`ls /opt/biosoft/PBSuite* -d` $ echo "perl -p -i -e 's#export SWEETPATH=.*#export SWEETPATH=$SWEETPATH#' $SWEETPATH/setup.sh" | sh $ echo "source $SWEETPATH/setup.sh" >> ~/.bashrc
3. 使用 PBJelly2 进行补洞
首先创建配置文件 Protocol.xml,内容如下:
<jellyProtocol> <reference>基因组fasta文件的路径</reference> <outputDir>输出文件路径</outputDir> <blasr>-minMatch 8 -minPctIdentity 70 -bestn 1 -nCandidates 20 -maxScore -500 -nproc 24 -noSplitSubreads</blasr> <input baseDir="输入Pacbio数据文件所在的文件夹"> <job>Pacbio数据文件名称</job> </input> </jellyProtocol>
然后依次运行下6步:
$ Jelly.py setup Protocol.xml $ Jelly.py mapping Protocol.xml $ Jelly.py support Protocol.xml $ Jelly.py extraction Protocol.xml $ Jelly.py assembly Protocol.xml -x "--nproc=24" $ Jelly.py output Protocol.xml --nproc 参数设置运行线程数。
输出结果文件为 jelly.out.fasta 。