human genome h38 infromation downloading

Writing date: 2015-11-17.

The latest Human Genome assembly version is : GRCh38 (GCA_000001405.15) . GRch38: Genome Reference Consortium Human Reference 38.

The GRch38 genome browses:

The downloading website of GRch38 information in Ensembl:
I recommend to download gh38 sequence functional annotations from Ensembl:

mdkir sequence_annotation
cd sequence_annotation
lftp -e "mirror -c --parallel=5 /pub/release-82/genbank/homo_sapiens/"
cd ..

The downloading website of GRch38 information in GENCODE:
I recommend to download gh38 fasta and gff3 files from GENCODE. These 2 files would be the main fasta and gff3 files for most users.




在 <svg xmlns… 这行尾部添加 transform=”translate(0,20)” 解决。


1. 纤维素

Cellulose is a dominant structural polysaccharide in plants composed ofβ -D-glucose units with β-1,4-linkages.

Cellulose decomposition requires multiple enzymes. In general, cellulose is degraded to cellodextrin or cellobiose by the synergistic action of two cellulases: endoglucanase (EC and cellobiohydrolase (EC (Tomme et al., 1995; Bayer et al., 1998). Degradation of cellodextrin or cellobiose into monomeric glucose units requires another enzyme, β-glucosidase (EC, that hydrolyzes non-reducing 1,4-linked-β-glucose (Henrissat et al., 1989).

2. 半纤维素

Cellulose fibers are cross-linked by other polysaccharides called `hemicelluloses’ to increase the physical strength of the cell wall. Hemicelluloses include xylan (β-D-xylose units with β-1,4-linkages), glucomannan (β-D-mannose units andβ -D-glucose units with β-1,4-linkages), xyloglucan (β-D-glucose units with β-1,4-linkages, andβ -D-xylose and β-D-glucose units withβ -1,6-linkages), 1,3-1,4-β-glucan (β-D-glucose units with β-1,3- and β-1,4-linkages), and a relatively small amount of other polysaccharides composed of β-D-glucose,β -D-xylose, β-D-mannose and other sugar units with various linkages (McNeill et al., 1984).

3. 果胶

The scaffold of cellulose and hemicelluloses is filled with pectin (α-D-galacturonic acid units with mainly α-1,4-linkages), which functions as a cement-like substance in the cell wall.

Sakamoto, Kentaro, and Haruhiko Toyohara. “A comparative study of cellulase and hemicellulase activities of brackish water clam Corbicula japonica with those of other marine Veneroida bivalves.” Journal of Experimental Biology 212.17 (2009): 2812-2818.


1. WIG格式介绍

WIG格式(Wiggle Track Format),可用于将转录组数据进行可视化展示。bigWig格式则是WIG格式的二进制方式,可以使用wigToBigWig将WIG格式转换成BigWig格式。
一个 WIG 格式实例文件:

track type=wiggle_0 name="sampleA1" description="RNA-Seq read counts of species A"
variableStep chrom=chr01 span=10
10001    13
10011    15
10021    12
fixedStep chrom=chr01 start=100031 step=10 span=10


1. 第一行必须如理示例中格式。只有name和description这两个参数的值可以随意填写。
2. 有两种方法进行数据描述。分别是variableStep和fixedStep。前者数据内容用2行表示,后者数据部分仅用1行表示。
3. 这两种方法的几个参素意义为:
    chrom    设置序列名
    start    fixStep中Locus的起始位置
    step     fixStep中Locus的步进
    span     一个数据对应碱基数目

2. 将Bam文件转换成WIG文件并进行压缩


$ bam2wig sampleA1.tophat.bam > sampleA1.wig


use strict;

my $usage = <<USAGE;
    perl $0 RNA-Seq.wig > RNA-Seq.cutdown.wig
if (@ARGV==0){die $usage}

open IN, $ARGV[0] or die $!;

$_ = <>;

my $locus = 1;
my $count = 0;
while () {
    if (m/^variableStep/) {
        $count = int(($count + 0.5) / 10);
        print "$locus\t$count\n" if $count > 0;
        s/$/ span=10/;
        $locus = 1;
    else {
        if (m/(\d+)\s+(\d+)/) {
            my ($num1, $num2) = ($1, $2);
            if ($num1 >= $locus + 10) {
                $count = int(($count + 0.5) / 10);
                print "$locus $count\n" if $count > 0;
                $locus = $num1;
                $count = 0;
            $count += $num2;

3. 将wig文件转换成wig binary文件和一个gff3文件

使用Gbrowse2所带命令 将wig文件转换成wig binary文件和一个gff3文件。每个基因组序列得到一个二进制格式的wig文件。同时生成一个gff3文件。该gff3文件指向所有的wig binary文件。

$ mkdir $PWD/gbrowse_track_of_RNA_seq
$ --source=sampleA1 --method=RNA_Seq --path=$PWD/gbrowse_track_of_RNA_seq --trackname=track_A1 sampleA1.wig > sampleA1.gff3

4. 导入gff3文件到数据库,并配置Gbrowse配置文件


$ -a DBI::mysql -d gbrowse2_species -u train -p 123456 sampleA1.gff3


feature        = RNA_Seq:sampleA1
glyph          = wiggle_xyplot
graph_type     = boxes
height         = 50
scale          = right
description    = 1
category       = RNA-Seq:sampleA1
key            = Transcriptional Profile

feature        = RNA_Seq:sampleA1
glyph          = wiggle_density
height         = 30
bgcolor        = blue
description    = 1
category       = RNA-Seq:sampleA1
key            = Transcriptional Profile

Installing QIIME-1.9.1 on CentOS 6.5 (By Yue Zheng)


QIIME consists of native Python 2 code and additionally wraps many external applications. As a consequence of this pipeline architecture, QIIME has a lot of dependencies and can be very challenging to install.

1. Setting up qiime-deploy on CentOS

1.1 sudo vim /etc/yum.repos.d/zeromq.repo

Paste the following into that file:

name=The latest stable of zeromq builds (CentOS_CentOS-6)

Save and exit that file

1.2 Install the qiime-deploy dependencies on your machine

sudo yum groupinstall -y "development tools"
sudo yum install -y ant compat-gcc-34-g77 java-1.6.0-openjdk java-1.6.0-openjdk-devel freetype freetype-devel zlib-devel mpich2 readline-devel zeromq zeromq-devel gsl gsl-devel libxslt libpng libpng-devel libgfortran mysql mysql-devel libXt libXt-devel libX11-devel mpich2 mpich2-devel libxml2 xorg-x11-server-Xorg dejavu* python-devel sqlite-devel tcl-devel tk-devel R R-devel ghc

2. Installing requisite Python and R packages

# Installing sqlite-devel
sudo yum install sqlite-devel –y

# Installing Python 2.7
tar xf Python-2.7.8.tgz
cd Python-2.7.8
./configure --prefix=/usr
make && make install

# Install setuptools & pip
# First get the setup script for Setuptools:
# Then install it for Python 2.7 :
sudo python2.7
# Now install pip using the newly installed setuptools:
sudo easy_install-2.7 pip
# With pip installed you can now do things like this:
pip2.7 install [packagename]

# Install virtualenv for Python 2.7
sudo pip2.7 install virtualenv

# Check the system Python interpreter version
python --version
# This will show Python 2.7.8

# Maybe you will found yum can not be used this moment, because yum is associated with python2.6. Thus, we modified the yum conf files to use python2.6
sudo vim /usr/bin/yum
# Replace “#!/usr/bin/python” by “#!/usr/bin/python2.6”
# Installing R packages
# Run R and execute the following commands
install.packages(c('ape', 'biom', 'optparse', 'RColorBrewer', 'randomForest', 'vegan'))
biocLite(c('DESeq2', 'metagenomeSeq'))

3. Install the latest QIIME release and its base dependencies is with pip

sudo pip2.7 install numpy
sudo pip2.7 install qiime -v
# For Chines user, you may find the suspend of pip, as the limitation of network. For example, If FastTree cannot be download, you can download it by another port of internet, and then post the install package into your local address. Next step, Downloading the qiime-1.9.1.tar.gz and changing the description of FastTree in After you modified the qiime-1.9.1.tar.gz you can post it into your local address. Finally, run sudo pip2.7 install qiime –v –i [local address]

# Installing QIIME 1.9.0's dependencies
# Downloading the zip packages of ‘qiime deploy’ and ‘qiime deploy conf’ from Github
mkdir ~/qiime_software
cd qiime-deploy-master
sudo python2.7 ~/qiime_software/ -f ~/qiime-deploy-conf/qiime/qiime-1.9.1/qiime.conf --force-remove-failed-dirs
# After this step, it will display the list including ‘Packages deployed successfully’, ‘Packages skipped’ and ‘Packages failed to deply’
source ~/qiime_software/ –tf

# If there are some packages were uninstalled, you should install them manually
# For example, usearch and amplicannoise were failed to install.

# Installing usearch manually
# Visting to download the USEARCH v5.2.236
# Moving the binary file into /usr/bin and change the name as usearch, then chmod 755 [the binary file] 

# Installing usearch manually
# Downloading the AmpliconNoiseV1.27.tar.gz
tar -xvzf AmpliconNoiseV1.27.tar.gz
cd AmpliconNoiseV1.27
make clean
make install
echo "export PATH=$HOME/AmpliconNoiseV1.27/Scripts:$HOME/AmpliconNoiseV1.27/bin:$PATH" >> $HOME/.bashrc
echo "export PYRO_LOOKUP_FILE=$HOME/AmpliconNoiseV1.27/Data/LookUp_E123.dat" >> $HOME/.bashrc
echo "export SEQ_LOOKUP_FILE=$HOME/AmpliconNoiseV1.27/Data/Tran.dat" >> $HOME/.bashrc

# PATH Environment Variable
echo "export PATH=$HOME/bin/:$PATH" >> $HOME/.bashrc
source $HOME/.bashrc

# Finnaly verification
source ~/qiime_software/ –tf


1. 邮件服务器域名解析


记录类型    主机记录    记录值
A           mail
MX          @ 
TXT         @           v=spf1 a mx -all

2. CentOS postfix 设置

然后修改 CetnOS 系统下的 PostFix 的配置文件 /etc/postfix/ , 修改的内容如下:

myhostname =
mydomain =
myorigin = $mydomain
inet_interfaces = all
inet_protocols = ipv4
mydestination = $myhostname, localhost.$mydomain, localhost, $mydomain
mynetworks =,, hash:/etc/postfix/access
relay_domains = $mydestination
home_mailbox = Maildir/
mail_spool_directory = /var/spool/mail
message_size_limit = 52428800

然后运行如下命令启动 Postfix 服务:

# postmap hash:/etc/postfix/access 
# postalias hash:/etc/aliases
# /etc/init.d/postfix check
# /etc/init.d/postfix restart

3. 使用 mail 命令发送邮件


-s subject
-a attachment
-c address
-b address


$ mail -s "a e-mail subject" -a ./test.tar.gz < mail_content
$ cat mail_content | mail -s "a e-mail subject" -a ./test.tar.gz
$ echo "mail_content" | mail -s "a e-mail subject" -a ./test.tar.gz
$ mail -s "a e-mail subject" -a ./test.tar.gz


由于编译或使用新版本需要高版本的 GCC 和 wxWidgets,因此,不推荐使用新版本的 FileZilla。Fileilla 官网仅提供了最新版本的下载链接。可以到sourceforge上下载旧版本。

$ wget
$ tar jxf FileZilla_3.5.3_x86_64-linux-gnu.tar.bz2 -C /opt/
$ echo 'PATH=$PATH:/opt/FileZilla3/bin/' >> ~/.bashrc
$ source ~/.bashrc 
$ filezilla



1. 大片段的序列名(object)
2. 大片段起始(object_begin)
3. 大片段结束(object_end)
4. 该段序列在大片段上的编号(part_number)
5. 该段序列的类型(component_type)
    常用的是W、N和U。W表示WGS contig;N表示指定大小的gap;U表示不明确长度的gap,一般用100bp长度。
6. 小片段的ID或gap长度(component_id or gap_length)
7. 小片段起始或gap类型(component_begin or gap_type)
    如果第5列是N或U,则此列表示gap的类型。常用的值是scaffold,表示是scaffold内2个contigs之间的gap。其它值有:contig,2个contig序列之间的unspanned gap,这样的gap由于没有证据表明有gap,应该要打断大片段序列;centromere,表示中心粒的gap;short_arm,a gap inserted at the start of an acrocentric chromosome;heterochromatin,a gap inserted for an especially large region of heterochromatic sequence;telomere,a gap inserted for the telomere;repeat,an unresolvable repeat。
8. 小片段结束或gap是否被连接(component_end or linkage)
9. 小片段方向或gap的连接方法(orientation or linkage_evidence)
    如果第5列不为N或U,则此列为小片段的方向。其常见的值为 +、-或?。
    如果第5列是N或U,则此列表明临近的2个小片段能连接的证据类型。其用的值是paired-ends,表明成对的reads将小片段连接起来。其它值有:na,第8列值为no的时候使用;align_genus,比对到同属的参考基因组而连接;align_xgenus,比对到其它属的参考基因组而连接;align_trnscpt,比对到同样物种的转录子序列上;within_clone,gap两边的序列来自与同一个clone,但是gap没有paired-ends跨越,因此这种连接两边小片段无法确定方向和顺序;clone_contig,linkage is provided by a clone contig in the tiling path (TPF);map,根据连锁图,光学图等方法确定的连接;strobe,根据PacBio序列得到的连接;unspecified。如果有多中证据,则可以写上多种证据,之间用分号分割。

Scaffold from component (WGS)
Chromosome from scaffold (WGS)