通过WIG格式将转录组数据展示到Gbrowse2中

1. WIG格式介绍

WIG格式(Wiggle Track Format),可用于将转录组数据进行可视化展示。bigWig格式则是WIG格式的二进制方式,可以使用wigToBigWig将WIG格式转换成BigWig格式。
一个 WIG 格式实例文件:

track type=wiggle_0 name="sampleA1" description="RNA-Seq read counts of species A"
variableStep chrom=chr01 span=10
10001    13
10011    15
10021    12
fixedStep chrom=chr01 start=100031 step=10 span=10
17
15
20

如上例子,有2个注意点:

1. 第一行必须如理示例中格式。只有name和description这两个参数的值可以随意填写。
2. 有两种方法进行数据描述。分别是variableStep和fixedStep。前者数据内容用2行表示,后者数据部分仅用1行表示。
3. 这两种方法的几个参素意义为:
    chrom    设置序列名
    start    fixStep中Locus的起始位置
    step     fixStep中Locus的步进
    span     一个数据对应碱基数目

2. 将Bam文件转换成WIG文件并进行压缩

使用bam2wig命令将bam文件转换成wig文件。bam2wig命令可以来自于Augustus软件。

$ bam2wig sampleA1.tophat.bam > sampleA1.wig

该wig文件的span参数值为1。因此,当基因组越大,则wig文件越大。可以考虑设置span的值为10,能有效减小wig文件的大小。例如编写如些perl程序进行压缩wig文件。

#!/usr/bin/perl
use strict;

my $usage = <<USAGE;
Usage:
    perl $0 RNA-Seq.wig > RNA-Seq.cutdown.wig
USAGE
if (@ARGV==0){die $usage}

open IN, $ARGV[0] or die $!;

$_ = <>;
print;

my $locus = 1;
my $count = 0;
while () {
    if (m/^variableStep/) {
        $count = int(($count + 0.5) / 10);
        print "$locus\t$count\n" if $count > 0;
        s/$/ span=10/;
        print;
        $locus = 1;
    }
    else {
        if (m/(\d+)\s+(\d+)/) {
            my ($num1, $num2) = ($1, $2);
            if ($num1 >= $locus + 10) {
                $count = int(($count + 0.5) / 10);
                print "$locus $count\n" if $count > 0;
                $locus = $num1;
                $count = 0;
            }
            $count += $num2;
        }
    }
}

3. 将wig文件转换成wig binary文件和一个gff3文件

使用Gbrowse2所带命令 wiggle2gff3.pl 将wig文件转换成wig binary文件和一个gff3文件。每个基因组序列得到一个二进制格式的wig文件。同时生成一个gff3文件。该gff3文件指向所有的wig binary文件。

$ mkdir $PWD/gbrowse_track_of_RNA_seq
$ wiggle2gff3.pl --source=sampleA1 --method=RNA_Seq --path=$PWD/gbrowse_track_of_RNA_seq --trackname=track_A1 sampleA1.wig > sampleA1.gff3

4. 导入gff3文件到数据库,并配置Gbrowse配置文件

导入gff3文件

$ bp_seqfeature_load.pl -a DBI::mysql -d gbrowse2_species -u train -p 123456 sampleA1.gff3

配置文件:

[RNA_Seq_sampleA1_xyplot]
feature        = RNA_Seq:sampleA1
glyph          = wiggle_xyplot
graph_type     = boxes
height         = 50
scale          = right
description    = 1
category       = RNA-Seq:sampleA1
key            = Transcriptional Profile

[RNA_Seq_sampleA1_density]
feature        = RNA_Seq:sampleA1
glyph          = wiggle_density
height         = 30
bgcolor        = blue
description    = 1
category       = RNA-Seq:sampleA1
key            = Transcriptional Profile

Linux虚拟内存相关设置

1. 临时增加或减少虚拟内存

首先,创建一个指定大小的文件。例如创建一个大小为 200G 的文件

# dd if=/dev/zero of=swapadd1 bs=4096 count=52428800

然后,将该文件加入到交换分区

# mkswap swapadd1
# swapon swapadd1

关闭某一个文件指向的虚拟内存

# swapoff -v swapadd1

2. 设置虚拟内存开始利用的阈值

在CentOS8系统中,当物理内存使用量达到80%时,则开始使用虚拟内存。若想优先全部使用物理内存,当100%使用完毕物理内存后,才开始使用虚拟内存,则可以进行如下设置:

# sysctl vm.swappiness=0
# cat /proc/sys/vm/swappiness

3. 释放虚拟内存

有时候发现物理内存剩余很多,而需您内存使用了不少。这时可以使用以下命令来将虚拟内存数据转移到物理内存,并关闭虚拟内存:

# swapoff -a

然后再重新启用虚拟内存:

# swapon -a

Installing QIIME-1.9.1 on CentOS 6.5 (By Yue Zheng)

此方法是由郑越同学提供的。

QIIME consists of native Python 2 code and additionally wraps many external applications. As a consequence of this pipeline architecture, QIIME has a lot of dependencies and can be very challenging to install.

1. Setting up qiime-deploy on CentOS

1.1 sudo vim /etc/yum.repos.d/zeromq.repo

Paste the following into that file:

[home_fengshuo_zeromq]
name=The latest stable of zeromq builds (CentOS_CentOS-6)
type=rpm-md
baseurl=http://download.opensuse.org/repositories/home:/fengshuo:/zeromq/CentOS_CentOS-6/
gpgcheck=1
gpgkey=http://download.opensuse.org/repositories/home:/fengshuo:/zeromq/CentOS_CentOS-6/repodata/repomd.xml.key
enabled=1

Save and exit that file

1.2 Install the qiime-deploy dependencies on your machine

sudo yum groupinstall -y "development tools"
sudo yum install -y ant compat-gcc-34-g77 java-1.6.0-openjdk java-1.6.0-openjdk-devel freetype freetype-devel zlib-devel mpich2 readline-devel zeromq zeromq-devel gsl gsl-devel libxslt libpng libpng-devel libgfortran mysql mysql-devel libXt libXt-devel libX11-devel mpich2 mpich2-devel libxml2 xorg-x11-server-Xorg dejavu* python-devel sqlite-devel tcl-devel tk-devel R R-devel ghc

2. Installing requisite Python and R packages

# Installing sqlite-devel
sudo yum install sqlite-devel –y

# Installing Python 2.7
wget https://www.python.org/ftp/python/2.7.8/Python-2.7.8.tgz
tar xf Python-2.7.8.tgz
cd Python-2.7.8
./configure --prefix=/usr
make && make install

# Install setuptools & pip
# First get the setup script for Setuptools:
wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py
# Then install it for Python 2.7 :
sudo python2.7 ez_setup.py
# Now install pip using the newly installed setuptools:
sudo easy_install-2.7 pip
# With pip installed you can now do things like this:
pip2.7 install [packagename]

# Install virtualenv for Python 2.7
sudo pip2.7 install virtualenv

# Check the system Python interpreter version
python --version
# This will show Python 2.7.8

# Maybe you will found yum can not be used this moment, because yum is associated with python2.6. Thus, we modified the yum conf files to use python2.6
sudo vim /usr/bin/yum
# Replace “#!/usr/bin/python” by “#!/usr/bin/python2.6”
# Installing R packages
# Run R and execute the following commands
install.packages(c('ape', 'biom', 'optparse', 'RColorBrewer', 'randomForest', 'vegan'))
source('http://bioconductor.org/biocLite.R')
biocLite(c('DESeq2', 'metagenomeSeq'))
q()

3. Install the latest QIIME release and its base dependencies is with pip

sudo pip2.7 install numpy
sudo pip2.7 install qiime -v
# For Chines user, you may find the suspend of pip, as the limitation of network. For example, If FastTree cannot be download, you can download it by another port of internet, and then post the install package into your local address. Next step, Downloading the qiime-1.9.1.tar.gz and changing the description of FastTree in setpu.py. After you modified the qiime-1.9.1.tar.gz you can post it into your local address. Finally, run sudo pip2.7 install qiime –v –i [local address]

# Installing QIIME 1.9.0's dependencies
# Downloading the zip packages of ‘qiime deploy’ and ‘qiime deploy conf’ from Github
cd
unzip qiime-deploy-master.zip qiime-deploy-conf-master.zip
mkdir ~/qiime_software
cd qiime-deploy-master
sudo python2.7 qiime-deploy.py ~/qiime_software/ -f ~/qiime-deploy-conf/qiime/qiime-1.9.1/qiime.conf --force-remove-failed-dirs
# After this step, it will display the list including ‘Packages deployed successfully’, ‘Packages skipped’ and ‘Packages failed to deply’
source ~/qiime_software/active.sh
print_qiime_config.py –tf

# If there are some packages were uninstalled, you should install them manually
# For example, usearch and amplicannoise were failed to install.

# Installing usearch manually
# Visting http://www.drive5.com/usearch/download.html to download the USEARCH v5.2.236
# Moving the binary file into /usr/bin and change the name as usearch, then chmod 755 [the binary file] 

# Installing usearch manually
# Downloading the AmpliconNoiseV1.27.tar.gz
tar -xvzf AmpliconNoiseV1.27.tar.gz
cd AmpliconNoiseV1.27
make clean
make
make install
echo "export PATH=$HOME/AmpliconNoiseV1.27/Scripts:$HOME/AmpliconNoiseV1.27/bin:$PATH" >> $HOME/.bashrc
echo "export PYRO_LOOKUP_FILE=$HOME/AmpliconNoiseV1.27/Data/LookUp_E123.dat" >> $HOME/.bashrc
echo "export SEQ_LOOKUP_FILE=$HOME/AmpliconNoiseV1.27/Data/Tran.dat" >> $HOME/.bashrc

# PATH Environment Variable
echo "export PATH=$HOME/bin/:$PATH" >> $HOME/.bashrc
source $HOME/.bashrc

# Finnaly verification
source ~/qiime_software/active.sh
print_qiime_config.py –tf

邮件服务器的简单搭建

1. 邮件服务器域名解析

首先,我在万网上解析域名如下:

记录类型    主机记录    记录值
A           mail        115.29.105.12
MX          @           mail.chenlianfu.com
TXT         @           v=spf1 a mx -all

2. CentOS postfix 设置

然后修改 CetnOS 系统下的 PostFix 的配置文件 /etc/postfix/main.cf , 修改的内容如下:

myhostname = mail.chenlianfu.com
mydomain = chenlianfu.com
myorigin = $mydomain
inet_interfaces = all
inet_protocols = ipv4
mydestination = $myhostname, localhost.$mydomain, localhost, $mydomain
mynetworks = 127.0.0.0/8, 168.100.189.0/28, hash:/etc/postfix/access
relay_domains = $mydestination
home_mailbox = Maildir/
mail_spool_directory = /var/spool/mail
message_size_limit = 52428800

然后运行如下命令启动 Postfix 服务:

# postmap hash:/etc/postfix/access 
# postalias hash:/etc/aliases
# /etc/init.d/postfix check
# /etc/init.d/postfix restart
# 

3. 使用 mail 命令发送邮件

mail命令参数:

-s subject
    邮件的标题。若标题有空格,则需要使用引号。
-a attachment
    将目标文件作为附件发送。若有多个附件需要发送,则使用多个该参数。
-c address
    抄送副本到邮件地址列表。这些邮件地址使用逗号分隔。抄送的邮件地址和收件人地址能
被所收件地址看到。
-b address
    暗送的邮件地址列表。这些邮件地址使用逗号隔开。暗送的邮件地址不能被其收件地址看
到。故mail命令不能将邮件分别发送到邮件地址列表。

使用例子:

$ mail -s "a e-mail subject" -a ./test.tar.gz chenllianfu@foxmail.com < mail_content
$ cat mail_content | mail -s "a e-mail subject" -a ./test.tar.gz chenllianfu@foxmail.com
$ echo "mail_content" | mail -s "a e-mail subject" -a ./test.tar.gz chenllianfu@foxmail.com
$ mail -s "a e-mail subject" -a ./test.tar.gz chenllianfu@foxmail.com
input
EOT