blast进行重复序列屏蔽

1. 构建数据库的时候屏蔽参考序列的重复

segmasker 屏蔽氨基酸的低复杂序列
dustmasker 屏蔽核算序列的低复杂序列
windowmasker 按照序列重复的次数来屏蔽
convert2blastmask 根据小写字母来屏蔽

这几个都可以先得到一个含有屏蔽信息的文件。然后进行 makeblastdb 的时候输入这个文件,就可以相应的 masked 数据库了。

参考:http://www.ncbi.nlm.nih.gov/books/NBK279681/

2. 比对的时候对query序列的重复进行屏蔽

blast 比对的时候,可以对 query 序列进行屏蔽。 这几个参数估计这样理解:
-seg blastp的参数,是否对query 序列使用 segmasker来屏蔽低复杂重复,默认 no
-dust blastn的参数,是否对query 序列使用 dustmasker来屏蔽低复杂重复,默认 no
-lcase_masking 对query序列的小写部分进行屏蔽
-soft_masking 是否进行软屏蔽。软屏蔽则是不会使用屏蔽的序列进行种子比对,但是可以延长时候比对。硬屏蔽,则是直接不对屏蔽序列部分进行比对。blastn的默认值是yes,blastp的默认值是no

文档编辑经验点

1. 分节符的使用
点击:“页面布局”——“分隔符”——“分节符下一页”,在指定位置插入分节符,用于将文章不同的章节进行分割。这样可以保证:下一章节的第一行则总是在页面的最上面;下一章节的排版和上一章节可以不一致,例如纸张方向不一致。

2. 使用Endnote分别对每一章节插入文献
默认情况下Endnote是将文献插入到文章最后面的。若需要将文献插入到各个章节后面,则在Endnote中设置,例如:点击“Edit”——“Output Styles”——“Edit BMC genomics”——“Sections”——选中“Create a bibliography for each section”——退出保存该格式为另外一个名字,然后使用这个保存的格式。

3.

human genome h38 infromation downloading

Writing date: 2015-11-17.

The latest Human Genome assembly version is : GRCh38 (GCA_000001405.15) . GRch38: Genome Reference Consortium Human Reference 38.

The GRch38 genome browses:
UCSC http://genome.ucsc.edu/cgi-bin/hgGateway
Ensembl http://www.ensembl.org/Homo_sapiens/Info/Index
Vega http://vega.sanger.ac.uk/Homo_sapiens/Info/Index
GENCODE http://www.gencodegenes.org/human_biodalliance.html

The downloading website of GRch38 information in Ensembl: http://www.ensembl.org/info/data/ftp/index.html
I recommend to download gh38 sequence functional annotations from Ensembl: ftp://ftp.ensembl.org/pub/release-82/genbank/homo_sapiens/.

mdkir sequence_annotation
cd sequence_annotation
lftp -e "mirror -c --parallel=5 /pub/release-82/genbank/homo_sapiens/" ftp://ftp.ensembl.org
cd ..

The downloading website of GRch38 information in GENCODE: http://www.gencodegenes.org/releases/23.html
I recommend to download gh38 fasta and gff3 files from GENCODE. These 2 files would be the main fasta and gff3 files for most users.

wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_23/GRCh38.primary_assembly.genome.fa.gz
wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_23/gencode.v23.basic.annotation.gff3.gz

SVG更改坐标系原点位置

在使用FigTree画树后。由于设置字体大小>14,于是导致export出来的图片中最上面一行字被截断了,从而使图片很丑。于是export出SVG格式文件。然后修改SVG坐标系原点位置,将图片完整显示出来。

在 <svg xmlns… 这行尾部添加 transform=”translate(0,20)” 解决。

纤维素,半纤维素和果胶的成份及其降解酶

1. 纤维素

Cellulose is a dominant structural polysaccharide in plants composed ofβ -D-glucose units with β-1,4-linkages.

Cellulose decomposition requires multiple enzymes. In general, cellulose is degraded to cellodextrin or cellobiose by the synergistic action of two cellulases: endoglucanase (EC 3.2.1.4) and cellobiohydrolase (EC 3.2.1.91) (Tomme et al., 1995; Bayer et al., 1998). Degradation of cellodextrin or cellobiose into monomeric glucose units requires another enzyme, β-glucosidase (EC 3.2.1.21), that hydrolyzes non-reducing 1,4-linked-β-glucose (Henrissat et al., 1989).

2. 半纤维素

Cellulose fibers are cross-linked by other polysaccharides called `hemicelluloses’ to increase the physical strength of the cell wall. Hemicelluloses include xylan (β-D-xylose units with β-1,4-linkages), glucomannan (β-D-mannose units andβ -D-glucose units with β-1,4-linkages), xyloglucan (β-D-glucose units with β-1,4-linkages, andβ -D-xylose and β-D-glucose units withβ -1,6-linkages), 1,3-1,4-β-glucan (β-D-glucose units with β-1,3- and β-1,4-linkages), and a relatively small amount of other polysaccharides composed of β-D-glucose,β -D-xylose, β-D-mannose and other sugar units with various linkages (McNeill et al., 1984).

3. 果胶

The scaffold of cellulose and hemicelluloses is filled with pectin (α-D-galacturonic acid units with mainly α-1,4-linkages), which functions as a cement-like substance in the cell wall.

reference:
Sakamoto, Kentaro, and Haruhiko Toyohara. “A comparative study of cellulase and hemicellulase activities of brackish water clam Corbicula japonica with those of other marine Veneroida bivalves.” Journal of Experimental Biology 212.17 (2009): 2812-2818.

通过WIG格式将转录组数据展示到Gbrowse2中

1. WIG格式介绍

WIG格式(Wiggle Track Format),可用于将转录组数据进行可视化展示。bigWig格式则是WIG格式的二进制方式,可以使用wigToBigWig将WIG格式转换成BigWig格式。
一个 WIG 格式实例文件:

track type=wiggle_0 name="sampleA1" description="RNA-Seq read counts of species A"
variableStep chrom=chr01 span=10
10001    13
10011    15
10021    12
fixedStep chrom=chr01 start=100031 step=10 span=10
17
15
20

如上例子,有2个注意点:

1. 第一行必须如理示例中格式。只有name和description这两个参数的值可以随意填写。
2. 有两种方法进行数据描述。分别是variableStep和fixedStep。前者数据内容用2行表示,后者数据部分仅用1行表示。
3. 这两种方法的几个参素意义为:
    chrom    设置序列名
    start    fixStep中Locus的起始位置
    step     fixStep中Locus的步进
    span     一个数据对应碱基数目

2. 将Bam文件转换成WIG文件并进行压缩

使用bam2wig命令将bam文件转换成wig文件。bam2wig命令可以来自于Augustus软件。

$ bam2wig sampleA1.tophat.bam > sampleA1.wig

该wig文件的span参数值为1。因此,当基因组越大,则wig文件越大。可以考虑设置span的值为10,能有效减小wig文件的大小。例如编写如些perl程序进行压缩wig文件。

#!/usr/bin/perl
use strict;

my $usage = <<USAGE;
Usage:
    perl $0 RNA-Seq.wig > RNA-Seq.cutdown.wig
USAGE
if (@ARGV==0){die $usage}

open IN, $ARGV[0] or die $!;

$_ = <>;
print;

my $locus = 1;
my $count = 0;
while () {
    if (m/^variableStep/) {
        $count = int(($count + 0.5) / 10);
        print "$locus\t$count\n" if $count > 0;
        s/$/ span=10/;
        print;
        $locus = 1;
    }
    else {
        if (m/(\d+)\s+(\d+)/) {
            my ($num1, $num2) = ($1, $2);
            if ($num1 >= $locus + 10) {
                $count = int(($count + 0.5) / 10);
                print "$locus $count\n" if $count > 0;
                $locus = $num1;
                $count = 0;
            }
            $count += $num2;
        }
    }
}

3. 将wig文件转换成wig binary文件和一个gff3文件

使用Gbrowse2所带命令 wiggle2gff3.pl 将wig文件转换成wig binary文件和一个gff3文件。每个基因组序列得到一个二进制格式的wig文件。同时生成一个gff3文件。该gff3文件指向所有的wig binary文件。

$ mkdir $PWD/gbrowse_track_of_RNA_seq
$ wiggle2gff3.pl --source=sampleA1 --method=RNA_Seq --path=$PWD/gbrowse_track_of_RNA_seq --trackname=track_A1 sampleA1.wig > sampleA1.gff3

4. 导入gff3文件到数据库,并配置Gbrowse配置文件

导入gff3文件

$ bp_seqfeature_load.pl -a DBI::mysql -d gbrowse2_species -u train -p 123456 sampleA1.gff3

配置文件:

[RNA_Seq_sampleA1_xyplot]
feature        = RNA_Seq:sampleA1
glyph          = wiggle_xyplot
graph_type     = boxes
height         = 50
scale          = right
description    = 1
category       = RNA-Seq:sampleA1
key            = Transcriptional Profile

[RNA_Seq_sampleA1_density]
feature        = RNA_Seq:sampleA1
glyph          = wiggle_density
height         = 30
bgcolor        = blue
description    = 1
category       = RNA-Seq:sampleA1
key            = Transcriptional Profile

Installing QIIME-1.9.1 on CentOS 6.5 (By Yue Zheng)

此方法是由郑越同学提供的。

QIIME consists of native Python 2 code and additionally wraps many external applications. As a consequence of this pipeline architecture, QIIME has a lot of dependencies and can be very challenging to install.

1. Setting up qiime-deploy on CentOS

1.1 sudo vim /etc/yum.repos.d/zeromq.repo

Paste the following into that file:

[home_fengshuo_zeromq]
name=The latest stable of zeromq builds (CentOS_CentOS-6)
type=rpm-md
baseurl=http://download.opensuse.org/repositories/home:/fengshuo:/zeromq/CentOS_CentOS-6/
gpgcheck=1
gpgkey=http://download.opensuse.org/repositories/home:/fengshuo:/zeromq/CentOS_CentOS-6/repodata/repomd.xml.key
enabled=1

Save and exit that file

1.2 Install the qiime-deploy dependencies on your machine

sudo yum groupinstall -y "development tools"
sudo yum install -y ant compat-gcc-34-g77 java-1.6.0-openjdk java-1.6.0-openjdk-devel freetype freetype-devel zlib-devel mpich2 readline-devel zeromq zeromq-devel gsl gsl-devel libxslt libpng libpng-devel libgfortran mysql mysql-devel libXt libXt-devel libX11-devel mpich2 mpich2-devel libxml2 xorg-x11-server-Xorg dejavu* python-devel sqlite-devel tcl-devel tk-devel R R-devel ghc

2. Installing requisite Python and R packages

# Installing sqlite-devel
sudo yum install sqlite-devel –y

# Installing Python 2.7
wget https://www.python.org/ftp/python/2.7.8/Python-2.7.8.tgz
tar xf Python-2.7.8.tgz
cd Python-2.7.8
./configure --prefix=/usr
make && make install

# Install setuptools & pip
# First get the setup script for Setuptools:
wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py
# Then install it for Python 2.7 :
sudo python2.7 ez_setup.py
# Now install pip using the newly installed setuptools:
sudo easy_install-2.7 pip
# With pip installed you can now do things like this:
pip2.7 install [packagename]

# Install virtualenv for Python 2.7
sudo pip2.7 install virtualenv

# Check the system Python interpreter version
python --version
# This will show Python 2.7.8

# Maybe you will found yum can not be used this moment, because yum is associated with python2.6. Thus, we modified the yum conf files to use python2.6
sudo vim /usr/bin/yum
# Replace “#!/usr/bin/python” by “#!/usr/bin/python2.6”
# Installing R packages
# Run R and execute the following commands
install.packages(c('ape', 'biom', 'optparse', 'RColorBrewer', 'randomForest', 'vegan'))
source('http://bioconductor.org/biocLite.R')
biocLite(c('DESeq2', 'metagenomeSeq'))
q()

3. Install the latest QIIME release and its base dependencies is with pip

sudo pip2.7 install numpy
sudo pip2.7 install qiime -v
# For Chines user, you may find the suspend of pip, as the limitation of network. For example, If FastTree cannot be download, you can download it by another port of internet, and then post the install package into your local address. Next step, Downloading the qiime-1.9.1.tar.gz and changing the description of FastTree in setpu.py. After you modified the qiime-1.9.1.tar.gz you can post it into your local address. Finally, run sudo pip2.7 install qiime –v –i [local address]

# Installing QIIME 1.9.0's dependencies
# Downloading the zip packages of ‘qiime deploy’ and ‘qiime deploy conf’ from Github
cd
unzip qiime-deploy-master.zip qiime-deploy-conf-master.zip
mkdir ~/qiime_software
cd qiime-deploy-master
sudo python2.7 qiime-deploy.py ~/qiime_software/ -f ~/qiime-deploy-conf/qiime/qiime-1.9.1/qiime.conf --force-remove-failed-dirs
# After this step, it will display the list including ‘Packages deployed successfully’, ‘Packages skipped’ and ‘Packages failed to deply’
source ~/qiime_software/active.sh
print_qiime_config.py –tf

# If there are some packages were uninstalled, you should install them manually
# For example, usearch and amplicannoise were failed to install.

# Installing usearch manually
# Visting http://www.drive5.com/usearch/download.html to download the USEARCH v5.2.236
# Moving the binary file into /usr/bin and change the name as usearch, then chmod 755 [the binary file] 

# Installing usearch manually
# Downloading the AmpliconNoiseV1.27.tar.gz
tar -xvzf AmpliconNoiseV1.27.tar.gz
cd AmpliconNoiseV1.27
make clean
make
make install
echo "export PATH=$HOME/AmpliconNoiseV1.27/Scripts:$HOME/AmpliconNoiseV1.27/bin:$PATH" >> $HOME/.bashrc
echo "export PYRO_LOOKUP_FILE=$HOME/AmpliconNoiseV1.27/Data/LookUp_E123.dat" >> $HOME/.bashrc
echo "export SEQ_LOOKUP_FILE=$HOME/AmpliconNoiseV1.27/Data/Tran.dat" >> $HOME/.bashrc

# PATH Environment Variable
echo "export PATH=$HOME/bin/:$PATH" >> $HOME/.bashrc
source $HOME/.bashrc

# Finnaly verification
source ~/qiime_software/active.sh
print_qiime_config.py –tf

邮件服务器的简单搭建

1. 邮件服务器域名解析

首先,我在万网上解析域名如下:

记录类型    主机记录    记录值
A           mail        115.29.105.12
MX          @           mail.chenlianfu.com
TXT         @           v=spf1 a mx -all

2. CentOS postfix 设置

然后修改 CetnOS 系统下的 PostFix 的配置文件 /etc/postfix/main.cf , 修改的内容如下:

myhostname = mail.chenlianfu.com
mydomain = chenlianfu.com
myorigin = $mydomain
inet_interfaces = all
inet_protocols = ipv4
mydestination = $myhostname, localhost.$mydomain, localhost, $mydomain
mynetworks = 127.0.0.0/8, 168.100.189.0/28, hash:/etc/postfix/access
relay_domains = $mydestination
home_mailbox = Maildir/
mail_spool_directory = /var/spool/mail
message_size_limit = 52428800

然后运行如下命令启动 Postfix 服务:

# postmap hash:/etc/postfix/access 
# postalias hash:/etc/aliases
# /etc/init.d/postfix check
# /etc/init.d/postfix restart
# 

3. 使用 mail 命令发送邮件

mail命令参数:

-s subject
    邮件的标题。若标题有空格,则需要使用引号。
-a attachment
    将目标文件作为附件发送。若有多个附件需要发送,则使用多个该参数。
-c address
    抄送副本到邮件地址列表。这些邮件地址使用逗号分隔。抄送的邮件地址和收件人地址能
被所收件地址看到。
-b address
    暗送的邮件地址列表。这些邮件地址使用逗号隔开。暗送的邮件地址不能被其收件地址看
到。故mail命令不能将邮件分别发送到邮件地址列表。

使用例子:

$ mail -s "a e-mail subject" -a ./test.tar.gz chenllianfu@foxmail.com < mail_content
$ cat mail_content | mail -s "a e-mail subject" -a ./test.tar.gz chenllianfu@foxmail.com
$ echo "mail_content" | mail -s "a e-mail subject" -a ./test.tar.gz chenllianfu@foxmail.com
$ mail -s "a e-mail subject" -a ./test.tar.gz chenllianfu@foxmail.com
input
EOT