{"id":2335,"date":"2015-08-25T17:01:18","date_gmt":"2015-08-25T09:01:18","guid":{"rendered":"http:\/\/www.chenlianfu.com\/?p=2335"},"modified":"2015-08-25T17:01:33","modified_gmt":"2015-08-25T09:01:33","slug":"%e4%bd%bf%e7%94%a8-gce-%e8%bf%9b%e8%a1%8c%e5%9f%ba%e5%9b%a0%e7%bb%84%e5%a4%a7%e5%b0%8f%e8%af%84%e4%bc%b0","status":"publish","type":"post","link":"http:\/\/www.chenlianfu.com\/?p=2335","title":{"rendered":"\u4f7f\u7528 GCE \u8fdb\u884c\u57fa\u56e0\u7ec4\u5927\u5c0f\u8bc4\u4f30"},"content":{"rendered":"<h1>1. GCE \u7b80\u4ecb<\/h1>\n<p>GCE(Genome Characteristics Estimation) \u662f\u534e\u5927\u57fa\u56e0\u7528\u4e8e\u57fa\u56e0\u7ec4\u8bc4\u4f30\u7684\u8f6f\u4ef6\uff0c\u5176\u53c2\u8003\u6587\u732e\u4e3a\uff1a<a href=\"http:\/\/www.researchgate.net\/publication\/255722390_Estimation_of_genomic_characteristics_by_analyzing_k-mer_frequency_in_de_novo_genome_projects\" target=\"_blank\">Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects<\/a>\u3002\u4e0b\u8f7d\u5730\u5740\uff1a<a href=\"ftp:\/\/ftp.genomics.org.cn\/pub\/gce\" target=\"_blank\">ftp:\/\/ftp.genomics.org.cn\/pub\/gce<\/a>\u3002<\/p>\n<p>GCE \u8f6f\u4ef6\u5305\u4e2d\u4e3b\u8981\u5305\u542b kmer_freq_hash \u548c gce \u4e24\u652f\u7a0b\u5e8f\u3002\u524d\u8005\u7528\u4e8e\u8fdb\u884c kmer \u7684\u9891\u6570\u7edf\u8ba1\uff0c\u540e\u8005\u5728\u524d\u8005\u7684\u7ed3\u679c\u4e0a\u8fdb\u884c\u57fa\u56e0\u7ec4\u5927\u5c0f\u7684\u51c6\u786e\u4f30\u7b97\u3002<\/p>\n<h1>2. GCE \u4e0b\u8f7d\u548c\u5b89\u88c5<\/h1>\n<pre>\r\n$ wget ftp:\/\/ftp.genomics.org.cn\/pub\/gce\/gce-1.0.0.tar.gz\r\n$ tar zxf gce-1.0.0.tar.gz -C \/opt\/biosoft\r\n<\/pre>\n<h1>3. kmer_freq_hash \u7684\u4f7f\u7528<\/h1>\n<p>kmer_freq_hash \u7684\u5e38\u7528\u4f8b\u5b50\uff1a<\/p>\n<pre>\r\n$ \/opt\/biosoft\/gce-1.0.0\/kmerfreq\/kmer_freq_hash\/kmer_freq_hash \\\r\n  -k 21 -l reads.list -a 10 -d 10 -t 24 -i 50000000 -o 0 -p species &> kmer_freq.log\r\n<\/pre>\n<p>kmer_freq_hash \u7684\u5e38\u7528\u53c2\u6570\uff1a<\/p>\n<pre>\r\n-k &lt;17&gt;\r\n    \u8bbe\u7f6e kmer \u7684\u5927\u5c0f\u3002\u8be5\u503c\u4e3a 9~27\uff0c\u9ed8\u8ba4\u503c\u4e3a 17 \u3002\r\n-l string\r\n    list\u6587\u672c\u6587\u4ef6\uff0c\u5176\u4e2d\u6bcf\u884c\u4e3a\u4e00\u4e2afastq\u6587\u4ef6\u7684\u8def\u5f84\u3002\r\n-t int\r\n    \u4f7f\u7528\u7684\u7ebf\u7a0b\u6570\uff0c\u9ed8\u8ba4\u4e3a 1 \u3002\r\n-i int\r\n    \u521d\u59cb\u7684 hash \u8868\u5927\u5c0f\uff0c\u9ed8\u8ba4\u4e3a 1048576\u3002\u8be5\u503c\u6700\u597d\u8bbe\u7f6e\u4e3a \uff08kmer \u7684\u79cd\u7c7b\u6570 \/ 0.75\uff09\/ \u7ebf\u7a0b\u6570\u3002\u5982\u679c\u57fa\u56e0\u7ec4\u5927\u5c0f\u4e3a 100M\uff0c\u6d4b\u5e8f\u4e86 40M \u4e2a reads\uff0creads \u7684\u957f\u5ea6\u4e3a 100bp\uff0c\u6d4b\u5e8f\u9519\u8bef\u7387\u4e3a 1%\uff0ckmer\u7684\u5927\u5c0f\u4e3a 21\uff0c\u5219kmer\u7684\u79cd\u7c7b\u6570\u4e3a100M+40M*100*1%*21=940M\uff0c\u82e5\u4f7f\u752824\u7ebf\u7a0b\uff0c\u5219\u8be5\u53c2\u6570\u8bbe\u7f6e\u4e3a i=940M\/0.75\/24=52222222\u3002\r\n-p string\r\n    \u8bbe\u7f6e\u8f93\u51fa\u6587\u4ef6\u7684\u524d\u7f00\u3002\r\n-o int\r\n    \u662f\u5426\u8f93\u51fa k-mer \u5e8f\u5217\u30021: yes, 0: no\uff0c\u9ed8\u8ba4\u4e3a 1 \u3002\u63a8\u8350\u9009 0 \u4ee5\u8282\u7ea6\u8fd0\u884c\u65f6\u95f4\u3002\r\n-q int\r\n    \u8bbe\u7f6efastq\u6587\u4ef6\u7684phred\u683c\u5f0f\uff0c\u9ed8\u8ba4\u4e3a 64\u3002\u8be5\u503c\u53ef\u4ee5\u4e3a 33 \u6216 63\u3002\r\n-c double\r\n    \u8bbe\u7f6ek-mer\u6700\u5c0f\u7684\u7cbe\u5ea6\uff0c\u8be5\u503c\u4f4d\u4e8e 0~0.99\uff0c\u6216\u4e3a -1\u3002 -1 \u8868\u793a\u4e0d\u5bf9 kmer\u8fdb\u884c\u8fc7\u6ee4\u3002\u8bbe\u7f6e\u8f83\u9ad8\u7684\u7cbe\u5ea6\uff0c\u53ef\u4ee5\u7528\u4e8e\u8fc7\u6ee4\u4f4e\u8d28\u91cf kmer\u3002\u7cbe\u5ea6\u662f\u7531 phred \u683c\u5f0f\u7684\u78b1\u57fa\u8d28\u91cf\u8ba1\u7b97\u5f97\u6765\u7684\u3002\r\n-r int\r\n    \u8bbe\u7f6e\u83b7\u53d6 k-mer \u4f7f\u7528\u5230\u7684 reads \u957f\u5ea6\u3002\u9ed8\u8ba4\u4f7f\u7528 reads \u7684\u5168\u957f\u3002\r\n-a int\r\n    \u5ffd\u7565read\u524d\u9762\u8be5\u957f\u5ea6\u7684\u78b1\u57fa\u3002\r\n-d int\r\n    \u5ffd\u7565read\u540e\u9762\u8be5\u957f\u5ea6\u7684\u78b1\u57fa\u3002\r\n-g int\r\n    \u8bbe\u7f6e\u4f7f\u7528\u8be5\u6570\u76ee\u7684\u78b1\u57fa\u6765\u83b7\u53d6 k-mers\uff0c\u9ed8\u8ba4\u662f\u4f7f\u7528\u6240\u6709\u7684\u78b1\u57fa\u6765\u83b7\u53d6 k-mer\u3002\r\n<\/pre>\n<p>kmer_freq_hash \u7684\u4e3b\u8981\u7ed3\u679c\u6587\u4ef6\u4e3a species.freq.stat\u3002\u8be5\u6587\u4ef6\u6709 2 \u5217\uff1a\u7b2c1\u5217\u662fkmer\u91cd\u590d\u7684\u6b21\u6570\uff0c\u7b2c\u4e8c\u5217\u662fkmer\u7684\u79cd\u7c7b\u6570\u3002\u8be5\u6587\u4ef6\u6709255\u884c\uff0c\u7b2c225\u884c\u8868\u793akmer\u91cd\u590d\u6b21\u6570&gt;=255\u7684kmer\u7684\u603b\u7684\u79cd\u7c7b\u6570\u3002\u8be5\u6587\u4ef6\u4f5c\u4e3a gce \u7684\u8f93\u5165\u6587\u4ef6\u3002<br \/>\nkmer_freq_hash \u7684\u8f93\u51fa\u5230\u5c4f\u5e55\u4e0a\u7684\u4fe1\u606f\u7ed3\u679c\u4fdd\u5b58\u5230\u6587\u4ef6 kmer_freq.log \u6587\u4ef6\u4e2d\u3002\u8be5\u6587\u4ef6\u4e2d\u6709\u7c97\u7565\u4f30\u8ba1\u57fa\u56e0\u7ec4\u7684\u5927\u5c0f\u3002\u5176\u4e2d\u7684 Kmer_individual_num \u6570\u636e\u4f5c\u4e3a gce \u7684\u8f93\u5165\u53c2\u6570\u3002<\/p>\n<h1>4. gce \u7684\u4f7f\u7528<\/h1>\n<p>gce \u7684\u5e38\u7528\u4f8b\u5b50\uff1a<\/p>\n<pre>\r\n$ \/opt\/biosoft\/gce-1.0.0\/gce \\\r\n  -f species.freq.stat -c 85 -g 4112118028 -m 1 -D 8 -b 1 > species.table 2> species.log\r\n<\/pre>\n<p>gce \u7684\u5e38\u7528\u53c2\u6570\uff1a<\/p>\n<pre>\r\n-f string\r\n    kmer depth frequency file\r\n-c int\r\n    kmer depth frequency \u7684\u4e3b\u5cf0\u5bf9\u5e94\u7684 depth\u3002gce \u4f1a\u5728\u8be5\u503c\u9644\u8fd1\u627e\u4e3b\u5cf0\u3002\r\n-g int\r\n    \u603b\u5171\u7684 kmer \u6570\u3002\u4e00\u5b9a\u8981\u8bbe\u5b9a\u8be5\u503c\uff0c\u5426\u5219 gce \u4f1a\u76f4\u63a5\u4f7f\u7528 -f \u6307\u5b9a\u7684\u6587\u4ef6\u8ba1\u7b97 kmer \u7684\u603b\u6570\u3002\u7531\u4e8e\u9ed8\u8ba4\u4e0b\u8be5\u6587\u4ef6\u4e2d\u6700\u5927\u7684 depth \u4e3a 255\uff0c\u56e0\u6b64\uff0c\u8f6f\u4ef6\u81ea\u5df1\u8ba1\u7b97\u7684\u503c\u6bd4\u771f\u5b9e\u7684\u503c\u504f\u5c0f\u3002\u540c\u65f6\u6ce8\u610f\u8be5\u503c\u5305\u542b\u4f4e\u8986\u76d6\u5ea6\u7684 kmer\u3002\r\n-M int\r\n    \u652f\u6301\u6700\u5927\u7684 depth \u503c\uff0c\u9ed8\u8ba4\u4e3a 256 \u3002\r\n-m int\r\n    \u4f30\u7b97\u6a21\u578b\u7684\u9009\u62e9\uff0c\u79bb\u6563\u578b\uff080\uff09\uff0c\u8fde\u7eed\u578b\uff081\uff09\u3002\u9ed8\u8ba4\u4e3a 0\uff0c\u5bf9\u771f\u5b9e\u6570\u636e\u63a8\u8350\u9009\u62e9 1 \u3002\r\n-D int\r\n    precision of expect value\uff0c\u9ed8\u8ba4\u4e3a 1\u3002\u5982\u679c\u9009\u62e9\u4e86 -m 1\uff0c\u63a8\u8350\u8bbe\u7f6e\u8be5\u503c\u4e3a 8\u3002\r\n-H int\r\n    \u4f7f\u7528\u6742\u5408\u6a21\u5f0f\uff081\uff09\uff0c\u4e0d\u4f7f\u7528\u6742\u5408\u6a21\u5f0f\uff080\uff09\u3002\u9ed8\u8ba4\u503c\u4e3a 0\u3002\u53ea\u6709\u660e\u663e\u5b58\u5728\u6742\u5408\u5cf0\u7684\u65f6\u5019\uff0c\u624d\u9009\u62e9\u8be5\u503c\u4e3a 1 \u3002\r\n-b int\r\n    \u6570\u636e\u662f\uff081\uff09\u5426\uff080\uff09\u6709 bias\u3002\u5f53 K > 19\u65f6\uff0c\u9700\u8981\u8bbe\u7f6e -b 1 \u3002\r\n<\/pre>\n<p>gce \u7684\u7ed3\u679c\u6587\u4ef6\u4e3a species.table \u548c species.log \u3002species.log \u6587\u4ef6\u4e2d\u7684\u4e3b\u8981\u5185\u5bb9\uff1a<\/p>\n<pre>\r\nraw_peak\tnow_node\tlow_kmer\tnow_kmer\tcvg\tgenome_size\ta[1]\tb[1]\r\n84\t35834245\t22073804\t4044916750\t84.6637\t4.83093e+07\t0.928318\t0.637648\r\n\r\nraw_peak\uff1a \u8986\u76d6\u5ea6\u4e3a 84 \u7684 kmer \u7684\u79cd\u7c7b\u6570\u6700\u591a\uff0c\u4e3a\u4e3b\u5cf0\u3002\r\nnow_node\uff1a kmer\u7684\u79cd\u7c7b\u6570\u3002\r\nlow_kmer\uff1a \u4f4e\u8986\u76d6\u5ea6\u7684 kmer \u6570\u3002\r\nnow_kmer\uff1a \u53bb\u9664\u4f4e\u8986\u76d6\u5ea6\u7684 kmer \u6570\uff0c\u6b64\u503c = \uff08-g \u53c2\u6570\u6307\u5b9a\u7684\u603b kmer \u6570\uff09 - low_kmer \u3002\r\ncvg\uff1a\u4f30\u7b97\u51fa\u7684\u5e73\u5747\u8986\u76d6\u5ea6\r\ngenome_size\uff1a\u57fa\u56e0\u7ec4\u5927\u5c0f\uff0c\u8be5\u503c = now_kmer \/ cvg \u3002\r\na[1]\uff1a \u5728\u57fa\u56e0\u7ec4\u4e0a\u4ec5\u51fa\u73b0 1 \u6b21\u7684 kmer \u4e4b \u79cd\u7c7b\u6570\u6bd4\u4f8b\u3002\r\nb[1]\uff1a \u5728\u57fa\u56e0\u7ec4\u4e0a\u4ec5\u51fa\u73b0 1 \u6b21\u7684 kmer \u4e4b \u6570\u91cf\u6bd4\u4f8b\u3002\u8be5\u503c\u4ee3\u8868\u7740\u57fa\u56e0\u7ec4\u4e0a\u62f7\u8d1d\u6570\u4e3a 1 \u7684\u5e8f\u5217\u6bd4\u4f8b\u3002\r\n<\/pre>\n<p>\u5982\u679c\u4f7f\u7528 -H 1 \u53c2\u6570\uff0c\u5219\u4f1a\u5f97\u989d\u5916\u5f97\u5230\u5982\u4e0b\u4fe1\u606f\uff1a<\/p>\n<pre>\r\nfor hybrid: a[1\/2]=0.223671 a1=0.49108\r\nkmer-species heterozygous ratio is about 0.125918\r\n\r\n\u4e0a\u9762\u7ed3\u679c\u4e2d\uff0c0.125918 \u662f\u7531 a[1\/2] \u8ba1\u7b97\u51fa\u6765\u7684\u3002 0.125918 = a[1\/2] \/ \uff08 2- a[1\/2] ) \u3002\r\na[1\/2]=0.223671 \u8868\u793a\u5728\u6240\u6709\u7684 uniqe kmer \u4e2d\uff0c\u6709 0.223671 \u6bd4\u4f8b\u7684 kmer \u5c5e\u4e8e\u6742\u5408 kmer \u3002\r\n\r\n\u6b64\u5916\uff0c\u6709 a[1\/2] \u548c b[1\/2] \u7684\u503c\u5728\u6700\u540e\u7684\u7edf\u8ba1\u7ed3\u679c\u4e2d\u3002\u91cd\u590d\u5e8f\u5217\u7684\u542b\u91cf = 1 - b[1\/2] - b[1] \u3002\r\n<\/pre>\n<p>\u5219\u6742\u5408\u7387 = 0.125918 \/ kmer_size \u3002 \u82e5\u8ba1\u7b97\u51fa\u7684\u6742\u5408\u7387\u4f4e\u4e8e 0.2%\uff0c\u4e2a\u4eba\u8ba4\u4e3a\u6d4b\u5e8f\u6570\u636e\u5e94\u8be5\u662f\u7eaf\u5408\u7684\u3002\u8fd9\u65f6\u5019\uff0c\u5e94\u8be5\u4e0d\u4f7f\u7528 -H 1 \u53c2\u6570\u3002\u4f7f\u7528 -H 1 \u53c2\u6570\u4f1a\u5bf9\u57fa\u56e0\u7ec4\u7684\u5927\u5c0f\u548c\u91cd\u590d\u5e8f\u5217\u542b\u91cf\u4f30\u7b97\u9020\u6210\u5f71\u54cd\u3002<\/p>\n<h1>5. \u4e0d\u540c\u6742\u5408\u7387\uff0c\u6709\u65e0\u91cd\u590d\u5e8f\u5217\u7684 kmer species \u548c kmer individuals \u56fe<\/h1>\n<p>\u4e0b\u56fe\u4e2d a \u548c b \u662f\u5bf9\u7406\u60f3\u4e2d\u65e0\u91cd\u590d\u7684\u57fa\u56e0\u7ec4\u5728\u4e0d\u540c\u6742\u5408\u7387\u4e0b\u7684\u66f2\u7ebf\u56fe\uff1b<br \/>\n\u4e0b\u56fe\u4e2d c \u548c d \u662f\u5bf9\u6709\u91cd\u590d\u7684\u57fa\u56e0\u7ec4(human)\u5728\u4e0d\u540c\u6742\u5408\u7387\u4e0b\u7684\u66f2\u7ebf\u56fe\u3002<br \/>\n\u4ece\u4e0b\u56fe\u53ef\u4ee5\u53c2\u8003\u4e0d\u540c\u6742\u5408\u7387\u4e0b\u7684\u66f2\u7ebf\u72b6\u51b5\u3002<br \/>\n<img src=\"http:\/\/www.chenlianfu.com\/data\/pictures\/kmer_picture.png\" alt=\"kerm_pictures\" \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. GCE \u7b80\u4ecb GCE(Genome Characteristics Est &hellip; <a href=\"http:\/\/www.chenlianfu.com\/?p=2335\">\u7ee7\u7eed\u9605\u8bfb <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[39,40],"_links":{"self":[{"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/posts\/2335"}],"collection":[{"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2335"}],"version-history":[{"count":2,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/posts\/2335\/revisions"}],"predecessor-version":[{"id":2337,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/posts\/2335\/revisions\/2337"}],"wp:attachment":[{"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2335"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2335"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2335"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}