Adjust P-values for Multiple Comparisons

Reference: Adjust P-values for Multiple Comparisons

Description

Given a set of p-values, returns p-values adjusted using one of several methods.

Usage

p.adjust(p, method = p.adjust.methods, n = length(p))

p.adjust.methods
# c(“holm”, “hochberg”, “hommel”, “bonferroni”, “BH”, “BY”,
# “fdr”, “none”)

Arguments

p numeric vector of p-values
n number of comparisons, must be at least length(p); only set this (to non-default) when you know what you are doing!

Details

The adjustment methods include the Bonferroni correction (“bonferroni”) in which the p-values are multiplied by the number of comparisons. Less conservative corrections are also included by Holm (1979) (“holm”), Hochberg (1988) (“hochberg”), Hommel (1988) (“hommel”), Benjamini & Hochberg (1995) (“BH” or its alias “fdr”), and Benjamini & Yekutieli (2001) (“BY”), respectively. A pass-through option (“none”) is also included. The set of methods are contained in the p.adjust.methods vector for the benefit of methods that need to have the method as an option and pass it on to p.adjust.

The first four methods are designed to give strong control of the family-wise error rate. There seems no reason to use the unmodified Bonferroni correction because it is dominated by Holm’s method, which is also valid under arbitrary assumptions.

Hochberg’s and Hommel’s methods are valid when the hypothesis tests are independent or when they are non-negatively associated (Sarkar, 1998; Sarkar and Chang, 1997). Hommel’s method is more powerful than Hochberg’s, but the difference is usually small and the Hochberg p-values are faster to compute.

The “BH” (aka “fdr”) and “BY” method of Benjamini, Hochberg, and Yekutieli control the false discovery rate, the expected proportion of false discoveries amongst the rejected hypotheses. The false discovery rate is a less stringent condition than the family-wise error rate, so these methods are more powerful than the others.

Note that you can set n larger than length(p) which means the unobserved p-values are assumed to be greater than all the observed p for “bonferroni” and “holm” methods and equal to 1 for the other methods.

Value

A numeric vector of corrected p-values (of the same length as p, with names copied from p).

GFOLD的安装与使用

一. GFOLD简介

GFOLD是由同济大学生物信息系张勇教授课题组研发的软件。gfold主要用于RNA-seq数据的差异表达基因分析,特别适合于实验没有重复的情况。

二. GFOLD的安装

GFOLD的安装需要安装gsl-1.15

安装gsl-1.15
$ wget ftp://ftp.gnu.org/gnu/gsl/gsl-1.15.tar.gz
$ tar zxf gsl-1.15.tar.gz
$ cd gsl-1.15
$ ./configure
$ make -j 8
$ sudo make install

安装gfold
$ wget https://bitbucket.org/feeldead/gfold/get/e78560195469.zip
$ unzip e78560195469.zip
$ cd feeldead-gfold-e78560195469
$ make
$ cd ../ ; mv feeldead-gfold-e78560195469 gfold

三. GFOLD的使用

使用gfold,发现结果没有cufflinks得出的结果好。该软件相对简单快速。不推荐使用。

ssh不提示将ip加入到known_hosts

需要使用ssh连接到指定的主机,然后运行一个命令后退出。但是该主机每次重新联网都会变化IP(PPOE上网).这时每次ssh的连接默认情况下会询问是否将主机ip永远加入到本地的~/.ssh/know_hosts中。

这种情况下则所需要的命令不会运行。比如运行程序:

$ ssh user@ip_address "mplayer ~/musics/audio/alarms/Beeps.mp3"

以上命令的目是主机上的声音。将此命令在远端服务器运行,来播放本地机上的一个提醒铃声。但是本地主机没有固定IP的时候,就不会有提醒。于是进行如下设置:

更改/etc/ssh/ssh_config文件

StrictHostKeyChecking no

只根据两个转录组数据分析差异表达基因

1. 数据

手头有两个转录组数据,分别是某一真菌物种(该物种没有基因组数据)的双核菌丝阶段和子实体阶段的转录组数据。测序平台是Illumina Hiseq2000;插入片段长度200bp,测序的reads长度90bp。得到的数据文件为:

/home/user/RNA-seq/mycelium_reads1.fastq
/home/user/RNA-seq/mycelium_reads2.fastq
/home/user/RNA-seq/fruitingbody_reads1.fastq
/home/user/RNA-seq/fruitingbody_reads2.fastq

2. 数据的预处理

2.1. 去除N的比例大于5%的reads;去除低质量reads(质量值Q≤20的碱基数占整个read的50%以上);
2.2. 根据FastQC对上述过滤后的reads的质量检测,去除reads首尾各10bp的碱基,得到的预处理数据为:

/home/user/RNA-seq/clean_reads/mycelium_reads1.fastq
/home/user/RNA-seq/clean_reads/mycelium_reads2.fastq
/home/user/RNA-seq/clean_reads/fruitingbody_reads1.fastq
/home/user/RNA-seq/clean_reads/fruitingbody_reads2.fastq

3. 使用Trinity和TGICL进行转录组的组装

3.1. 对mycelium数据和fruitingbody数据分别进行转录组的组装

$ pwd
/home/user/RNA-seq/

$ mkdir assembly
$ cd assembly

$ Trinity.pl --jaccard_clip --seqType fq --JM 50G --SS_lib_type FR --CPU 24 --inchworm_cpu 24 --bflyCPU 24 --group_pairs_distance 500 -min_contig_length 200 --output mycelium_contig --left /home/user/RNA-seq/clean_reads/mycelium_reads1.fastq --right /home/user/RNA-seq/clean_reads/mycelium_reads2.fastq
$ Trinity.pl --jaccard_clip --seqType fq --JM 50G --SS_lib_type FR --CPU 24 --inchworm_cpu 24 --bflyCPU 24 --group_pairs_distance 500 -min_contig_length 200 --output fruitingbody_contig --left /home/user/RNA-seq/clean_reads/fruitingbody_reads1.fastq --right /home/user/RNA-seq/clean_reads/fruitingbody_reads2.fastq

当然,需要将生成的两个转录组的Trinity.fasta序列按长度进行排序;统一更改的fasta
头名称;更改fasta文件名

$ cp mycelium_contig/Trinity.fasta mycelium_contigs.fasta
$ cp fruitingbody_contig/Trinity.fasta fruitingbody_contigs.fasta

3.2. 使用TGICL将两个转录组序列合并

$ pwd
/home/user/RNA-seq/assembly

$ mkdir all_tissue_contig
$ cd all_tissue_contig

$ cat ../mycelium_contigs.fasta ../fruitingbody_contigs.fasta > all.contigs.fasta
$ tgicl -F all.contigs.fasta

tgicl生成了两个有用的文件: asm_1/contigs  和 all.contigs.fasta.single
tons。其中前者是聚类后的contigs结果;后者是没有聚类的单独的contigs的序列名,需
要分别到../mycelium_contigs.fasta 和 ../fruitingbody_contigs.fasta文
件中提取出相应的序列: all.contigs.fasta.singletons.mycelium.fasta 和 
all.contigs.fasta.singletons.fruitingbody.fasta

$ cat asm_1/contigs all.contigs.fasta.singletons.mycelium.fasta all.contigs.fasta.singletons.fruitingbody.fasta > all_contigs.fasta

在对all_contigs.fasta进行fasta头的重命名,并将序列按长度排序;同时要得到all_
contigs.fasta中的序列和mycelium_contigs.fasta,fruitingbody_contigs.
fasta中序列的对应关系。

$ cp all_contigs.fasta ../

3.3. 至此得出转录组的组装结果:

/home/user/RNA-seq/assembly/all_contigs.fasta

4. 使用cufflinks来分析差异表达基因

4.1 使用tophat来将预处理后的reads比对到转录组序列上

$ pwd
/home/user/RNA-seq/

$ mkdir cufflinks
$ cd cufflinks

$ bowtie2-build ../all_contigs.fasta all_contigs

$ tophat -o tophat_out_mycelium -r 60 --mate-std-dev 80 -p 24 --library-type fr-unstranded all.contigs /home/user/RNA-seq/clean_reads/mycelium_reads1.fastq /home/user/RNA-seq/clean_reads/mycelium_reads2.fastq
$ tophat -o tophat_out_fruitingbody -r 60 --mate-std-dev 80 -p 24 --library-type fr-unstranded all.contigs /home/user/RNA-seq/clean_reads/fruitingbody_reads1.fastq /home/user/RNA-seq/clean_reads/fruitingbody_reads2.fastq

4.2 自制一个transcripts.gtf文件
由于cuffdiff运行需要一个参考序列的transcripts.gtf文件: 该文件有9列,使用tab分隔;使exon的范围为整个contig。其格式如:

AA.auricula_all_contig_1     chenlianfu  exon  1  9401  .  .  .  gene_id "A.auricula_all_contig_1"; transcript_id "A.auricula_all_contig_1";
A.auricula_all_contig_10     chenlianfu  exon  1  4464  .  .  .  gene_id "A.auricula_all_contig_10"; transcript_id "A.auricula_all_contig_10";
A.auricula_all_contig_100    chenlianfu  exon  1  3090  .  .  .  gene_id "A.auricula_all_contig_100"; transcript_id "A.auricula_all_contig_100";
A.auricula_all_contig_1000   chenlianfu  exon  1  1768  .  .  .  gene_id "A.auricula_all_contig_1000"; transcript_id "A.auricula_all_contig_1000";
A.auricula_all_contig_10000  chenlianfu  exon  1  586   .  .  .  gene_id "A.auricula_all_contig_10000"; transcript_id "A.auricula_all_contig_10000"; 
A.auricula_all_contig_10001  chenlianfu  exon  1  586   .  .  .  gene_id "A.auricula_all_contig_10001"; transcript_id "A.auricula_all_contig_10001";

4.3 使用cuffdiff来分析转录子的表达量和差异表达基因

$ pwd
/home/user/RNA-seq/cufflinks

$ cuffdiff -L mycelium,fruitingbody --library-type fr-unstranded -p 8 -o cuffdiff ./transcriptome.gtf ./tophat_out_mycelium/accepted_hits.bam ./tophat_out_fruitingbody/accepted_hits.bam

至此得出转录子的表达量数据和差异表达分析

/home/user/RNA-seq/cufflinks/cuffdiff/gene_exp.diff

基因组组装软件心得

1. SOAPdenovo 和 Velvet组装基因组是否需要对reads进行预处理?

使用soapdenovo组装基因组,需要先对reads进行预处理,去掉质量不好的reads,剪掉3’端和5’端的一些碱基;然后再进行组装。而使用velvet时候,则不需要,可能是由于velvet的cutoff比较好。
这个结论得出方法:将reads进行预处理前后分别用于组装出基因组,然后再使用bowtie2将两种reads比对到两种基因组组装结果,观察reads比对上的百分比得出。reads的预处理方法是去除低质量reads并将100bp的reads前后各减去5bp碱基。
对于soapdenovo:

                      未处理reads比对率         预处理reads比对率
未处理reads基因组          65.84%                   66.10%
预处理reads基因组          66.68%                   66.99%

对于velvet:

                      未处理reads比对率         预处理reads比对率
未处理reads基因组          86.41%                   88.10%
预处理reads基因组          70.09%                   71.62%

通过以上的对比和数据可以看出:1.velvet组装出的基因组的精确性要比soapdenovo高;2.对reads进行预处理对soapdenovo组装的基因组的精确性影响较小,预处理reads后,组装出的基因组精确性稍有提高;3.使用velvet组装基因组时,不对reads进行预处理组装出的基因组精确性高很多。
讨论:1.数据量不够多,碱基覆盖度只有20~25X, 以上结论具有一定的片面性; 2.velvet在对kmer进行cutoff的算法很棒,因而组装出的基因组精确性高。

Velvet的安装和使用

一. Velvet简介

VelvetEBIDaniel Zerbino and Ewan Birney 开发的利用short reads组装基因组的软件。
Velvet作为一种de novo基因组组装软件,可利用的reads包括 Illumina reads 和 454 reads。
引用文献(2008):Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

二. Velvet下载与安装

make ‘CATEGORIES=57’ ‘MAXKMERLENGTH=75’ ‘BIGASSEMBLY=1’ ‘LONGSEQUENCES=1’ ‘OPENMP=1’ ‘BUNDLEDZLIB=1’

远程服务器的命令——到本地声音提醒

使用ssh连接远程服务器后。每当运行一个较长的程序时候,不想一直等待,需要该程序结束后能有声音提醒。这时候可以参考我的使用方法:

1. 下载一些提醒铃声到本地文件中

本地一个铃声的存放路径:
~/musics/audio/alarms/Beeps.mp3

2. 和远程服务器相互设置无密码的ssh登录

参考我以前的一篇文章:CentOs配置OpenSSH无密码登陆

3. 编写脚本alarm.pl放置于服务器的系统路径下

#!/usr/bin/perl
use strict;

my $whoami = `whoami`;
$whoami =~ s/\s*//g;

my $lastlog = `lastlog`;

my $ip_address;

if ( $lastlog =~ m/^$whoami.*?(\d+\.\d+\.\d+\.\d+)/m ) {
	$ip_address = $1;
} else {
	die;
}

print "The ip_address is $ip_address\n";

my $commond = "ssh chenlianfu\@$ip_address \"mplayer -loop 0 ~/musics/audio/alarms/Beeps.mp3\"";
print "$commond\n";

`$commond`;

4. 每次执行运算时间较长的程序时则紧跟着运行 alarm.pl 则会在程序运行完毕后在本地计算机上提醒。

linux下ISO文件的制作,刻录和挂载

1. ISO文件的刻录

手头上有iso文件,比如CentOS-6.4-x86_64-bin-DVD1.iso,需要刻录成光碟,然后用于装系统用。使用 cdrecord 命令。

$ cdrecord dev=/dev/cdrw CentOS-6.4-x86_64-bin-DVD1.iso

常用参数:
-v | -verbose
speed=<int>
    指定刻录机速度,一般不用,因为cdrecord会自动检测最佳刻录速度
dev=<target>
    光驱的位置

2. ISO文件的制作

制作ISO文件可以是用 dd 或 mkisofs 命令。

2.1 dd用法

当直接从光驱中压制ISO文件时:

$ dd if=/dev/cdrw of=output.iso bs=1024

2.2 mkisofs用法

当将一个文件或目录压制到ISO文件时:

$ mkisofs -r -o output.iso folder_name/

3. ISO文件的挂载

$ sudo mount -o loop file.iso /mnt

wordpress插件

1. Akismet

用途:博客垃圾评论 / trackback 屏蔽解决方案。效果非常好。
需要的Akismet API 密钥:c28ea9690585

2. WP-Chinese-Excerpt

用途:在主页上只显示中文摘要。开启该插件则会有效果。编辑文章的时候最好写个摘要。其原版插件名为:WP-UTF8-Excerpt