{"id":1122,"date":"2013-05-02T09:56:33","date_gmt":"2013-05-02T01:56:33","guid":{"rendered":"http:\/\/www.hzaumycology.com\/chenlianfu_blog\/?p=1122"},"modified":"2013-05-02T10:03:06","modified_gmt":"2013-05-02T02:03:06","slug":"%e5%88%a9%e7%94%a8%e8%b6%85%e5%87%a0%e4%bd%95%e6%a6%82%e7%8e%87%e5%88%86%e5%b8%83%e5%81%9a%e5%af%8c%e9%9b%86%e5%88%86%e6%9e%90","status":"publish","type":"post","link":"http:\/\/www.chenlianfu.com\/?p=1122","title":{"rendered":"\u5229\u7528\u8d85\u51e0\u4f55\u6982\u7387\u5206\u5e03\u505a\u5bcc\u96c6\u5206\u6790"},"content":{"rendered":"<h1>1. \u4e86\u89e3\u8d85\u51e0\u4f55\u6982\u7387\u5206\u5e03<\/h1>\n<p>\u53c2\u8003\uff1a<a href=\"http:\/\/stat.ethz.ch\/R-manual\/R-patched\/library\/stats\/html\/Hypergeometric.html\" target=\"_blank\">The Hypergeometric Distribution<\/a><br \/>\n\u8d85\u51e0\u4f55\u5206\u5e03\u7684\u516c\u5f0f\u4e3a\uff1a<\/p>\n<pre>p(x) = choose(m, x) choose(n, k-x) \/ choose(m+n, k)\r\nfor x = 0, \u2026, k.\r\n\u5176\u4e2d, m \u662f\u6876\u91cc\u9762\u767d\u7403\u7684\u4e2a\u6570, n \u662f\u9ed1\u7403\u7684\u4e2a\u6570, k \u662f\u4ece\u6876\u4e2d\u968f\u673a\u53d6\u51fa\u7684\u7403\u6570, x \u662f\u53d6\u51fa\u7403\r\n\u4e2d\u767d\u7403\u7684\u4e2a\u6570.<\/pre>\n<p>Fisher\u2018s Exact Test\u5c31\u662f\u4f9d\u636e\u8d85\u51e0\u4f55\u6982\u7387\u5206\u5e03\u5f97\u5230\u7684\u3002<\/p>\n<h1>2. \u8d85\u51e0\u4f55\u5206\u5e03\u8fd0\u7b97\u65b9\u6cd5<\/h1>\n<p>\u4f7f\u7528R\u7684\u8fd0\u7b97\u65b9\u6cd5\uff1a<\/p>\n<pre>dhyper(x, m, n, k, log = FALSE)\r\n    \u8ba1\u7b97\u67d0\u4e00\u4e2a\u70b9\u7684p\u503c\r\nphyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)\r\n    \u8ba1\u7b97\u4e00\u4e2a\u5206\u5e03\u7684p\u503c\uff0c\u9ed8\u8ba4\u4e0b\u8ba1\u7b97P(X&lt;=x)\r\nqhyper(p, m, n, k, lower.tail = TRUE, log.p = FALSE)\r\n    \u8ba1\u7b97\u67d0\u4e00\u4e2ap\u503c\u5bf9\u5e94\u7684\u5206\u4f4d\u6570\r\nrhyper(nn, m, n, k)\r\n    \u6309\u8d85\u51e0\u4f55\u5206\u5e03\u7ed9\u51fann\u7684\u53ef\u80fd\u7684\u6a21\u62df\u7ed3\u679c\u503c<\/pre>\n<h1>3. \u5bf9\u4e00\u7ec4p\u503c\u8fdb\u884c\u6821\u6b63<\/h1>\n<p>\u53c2\u8003\uff1a<a href=\"http:\/\/stat.ethz.ch\/R-manual\/R-devel\/library\/stats\/html\/p.adjust.html\" target=\"_blank\">Adjust P-values for Multiple Comparisons<\/a><br \/>\n\u4f7f\u7528R\u7684\u8fd0\u7b97\u65b9\u6cd5\uff1a<\/p>\n<pre>p.adjust(p, method = p.adjust.methods, n = length(p))\r\n\r\np.adjust.methods\r\n# c(\u201cholm\u201d, \u201chochberg\u201d, \u201chommel\u201d, \u201cbonferroni\u201d, \u201cBH\u201d, \u201cBY\u201d,\r\n# \u201cfdr\u201d, \u201cnone\u201d)<\/pre>\n<h1>4. \u5229\u7528\u4e0a\u8ff0\u539f\u7406\u6765\u7f16\u5199perl\u7a0b\u5e8f\uff0c\u6765\u8fdb\u884cGO\u6216PATHWAY\u5bcc\u96c6\u5206\u6790<\/h1>\n<p>\u8f93\u5165\u4e24\u4e2a\u6587\u4ef6\uff1a\u7b2c1\u4e2a\u6587\u4ef6\u662f\u5404\u4e2a\u57fa\u56e0\u5bf9\u5e94\u7684GO\u5206\u7c7b\u6216KEGG\u5206\u7c7b\uff1b\u7b2c2\u4e2a\u6587\u4ef6\u662f\u5dee\u5f02\u8868\u8fbe\u57fa\u56e0\u7684\u540d\u79f0\u3002\u4ee5\u4e0b\u4e3aKEGG\u5bcc\u96c6\u5206\u6790\u7684perl\u7a0b\u5e8f\u3002<\/p>\n<pre>\u00a0#!\/usr\/bin\/perl\r\n\r\n=head1 Name\r\n\r\n\u00a0 kegg_enrich_analysis.pl\r\n\r\n=head1 Description\r\n\r\n\u00a0 This program is designed to do enrichment analysis by kegg results.\r\n\r\n=head1 Version\r\n\r\n\u00a0 Author: Chen Lianfu, chenllianfu@foxmail.com\r\n\u00a0 Version: 1.0,\u00a0 Date: Sat Apr 27 02:06:02 2013\r\n\r\n=head1 Usage\r\n\r\n\u00a0 --out\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 the output result name, default: enrichment_output\r\n\u00a0 --tmp\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 to keep the tmp files, default: false\r\n\u00a0 --fdr\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 set the false discovery rate, default: 0.05\r\n\u00a0 --kdn\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 When a kegg term was perpared for Fisher's exact \r\ntest, the i(the num of this kegg terms have so many differential \r\nexpression genes) should &gt;= this value, default: 3\r\n\u00a0 --help\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 output help information to screen\r\n\r\n=head1 Example\r\n\r\n\u00a0 perl ..\/kegg_enrich_analysis.pl annot_keggs genelist.txt\r\n\r\n=cut\r\n\r\nuse strict;\r\nuse Getopt::Long;\r\n\r\nmy ($out,$tmp,$fdr_threshold,$kegg2degs_num_threshold,$Help);\r\n\r\nGetOptions(\r\n\u00a0\u00a0 \u00a0\"out:s\"=&gt;\\$out,\r\n\u00a0\u00a0 \u00a0\"fdr:s\"=&gt;\\$fdr_threshold,\r\n\u00a0\u00a0 \u00a0\"kdn:s\"=&gt;\\$kegg2degs_num_threshold,\r\n\u00a0\u00a0 \u00a0\"tmp\"=&gt;\\$tmp,\r\n\u00a0\u00a0 \u00a0\"help\"=&gt;\\$Help\r\n);\r\n\r\n$out ||= 'enrichment_output';\r\n$fdr_threshold ||= 0.05;\r\n$kegg2degs_num_threshold ||= 3;\r\n\r\ndie `pod2text $0` if (@ARGV == 0 || $Help);\r\n\r\nmy $kegg2gene_file = shift @ARGV;\r\nmy $deglist = shift @ARGV;\r\n\r\nopen kegg2GENE, '&lt;', $kegg2gene_file or die $!;\r\nopen DEGLIST, '&lt;', $deglist or die $!;\r\n\r\nmy %kegg2genes;\r\nmy %gene2keggs;\r\nwhile ( &lt;kegg2GENE&gt; ) {\r\n\u00a0\u00a0 \u00a0next unless \/^(\\S+).*?(K\\d+)\/;\r\n\u00a0\u00a0 \u00a0$gene2keggs{$1} .= \"$2,\";\r\n\u00a0\u00a0 \u00a0$kegg2genes{$2}{\"genes\"} .= \"$1,\";\r\n\u00a0\u00a0 \u00a0$kegg2genes{$2}{\"num\"} += 1;\r\n}\r\n\r\nmy ( $total_deg_num, $deg_own_kegg_num );\r\nmy %deg_kegg2genes;\r\nwhile ( &lt;DEGLIST&gt; ) {\r\n\u00a0\u00a0 \u00a0chomp;\r\n\u00a0\u00a0 \u00a0my $genename = $_;\r\n\u00a0\u00a0 \u00a0$total_deg_num ++;\r\n\u00a0\u00a0 \u00a0if ( $gene2keggs{$genename} ) {\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0$deg_own_kegg_num ++;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0my @keggs = split \/,\/, $gene2keggs{$genename};\r\n\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0foreach ( @keggs ) {\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0$deg_kegg2genes{$_}{\"genes\"} .= \"$genename,\";\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0$deg_kegg2genes{$_}{\"num\"} ++;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0}\r\n\u00a0\u00a0 \u00a0}\r\n}\r\n\r\nprint \"$total_deg_num differential expression genes in total, and \r\n$deg_own_kegg_num genes own kegg terms.\\n\";\r\n\r\nmy @for_enrich_keggs = keys %deg_kegg2genes;\r\nopen FOR_R, '&gt;', \"tmp_4value\" or die $!;\r\n\r\nmy $kegg_num_for_fisher_test = @for_enrich_keggs;\r\nprint \"$kegg_num_for_fisher_test kegg terms is perpared for Fisher\r\n's exact test.\\n\";\r\n\r\nforeach my $kegg ( @for_enrich_keggs ) {\r\n\u00a0\u00a0 \u00a0my $n_and_m = keys %gene2keggs;\r\n\u00a0\u00a0 \u00a0my $N = $deg_own_kegg_num;\r\n\u00a0\u00a0 \u00a0my $n = $kegg2genes{$kegg}{\"num\"};\r\n\u00a0\u00a0 \u00a0my $i = $deg_kegg2genes{$kegg}{\"num\"};\r\n\u00a0\u00a0 \u00a0my $m = $n_and_m -$n;\r\n\r\n\u00a0\u00a0 \u00a0print FOR_R \"$kegg $i $n $m $N\\n\" if $i &gt;= $kegg2degs_num_threshold;\r\n}\r\n\r\nopen R_SCRIPT, '&gt;', \"phyper_adjust.R\" or die $!;\r\n\r\nprint R_SCRIPT \r\n'phy &lt;- read.table(file=\"tmp_4value\")\r\npvalue &lt;- phyper(phy$V2,phy$V3,phy$V4,phy$V5,lower.tail=FALSE)\r\nqvalue &lt;- p.adjust(pvalue,method=\\'fdr\\')\r\nvalue &lt;- cbind ( pvalue, qvalue )\r\nrownames(value)=phy$V1\r\nvalue\r\n';\r\n\r\nmy @fdr = `cat phyper_adjust.R | R --vanilla --slave`;\r\n\r\n#print @fdr;\r\n\r\n`rm phyper_adjust.R tmp_4value` unless $tmp;\r\n\r\nopen OUTPUT, '&gt;', $out or die $!;\r\n\r\nforeach ( @fdr ) {\r\n\u00a0\u00a0 \u00a0next unless \/(\\S+)\\s+(\\S+)\\s+(\\S+)\/;\r\n\u00a0\u00a0 \u00a0my $kegg = $1; my $pvalue = $2; my $fdr = $3;\r\n\r\n\u00a0\u00a0 \u00a0if ( $fdr &lt;= $fdr_threshold ) {\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0my $n_and_m = keys %gene2keggs;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0my $N = $deg_own_kegg_num;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0my $n = $kegg2genes{$kegg}{\"num\"};\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0my $i = $deg_kegg2genes{$kegg}{\"num\"};\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0my $m = $n_and_m -$n;\r\n\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0my $genes = $deg_kegg2genes{$kegg}{\"genes\"};\r\n\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0print OUTPUT \"$kegg\\t$i\\t$N\\t$n\\t$n_and_m\\t$pvalue\\t$fdr\\t$genes\\n\";\r\n\u00a0\u00a0 \u00a0}\r\n}<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>1. \u4e86\u89e3\u8d85\u51e0\u4f55\u6982\u7387\u5206\u5e03 \u53c2\u8003\uff1aThe Hypergeometric Distr &hellip; <a href=\"http:\/\/www.chenlianfu.com\/?p=1122\">\u7ee7\u7eed\u9605\u8bfb <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/posts\/1122"}],"collection":[{"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1122"}],"version-history":[{"count":5,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/posts\/1122\/revisions"}],"predecessor-version":[{"id":1126,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/posts\/1122\/revisions\/1126"}],"wp:attachment":[{"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1122"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1122"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1122"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}