{"id":1260,"date":"2013-05-08T15:52:38","date_gmt":"2013-05-08T07:52:38","guid":{"rendered":"http:\/\/www.hzaumycology.com\/chenlianfu_blog\/?p=1260"},"modified":"2013-10-23T08:49:22","modified_gmt":"2013-10-23T00:49:22","slug":"augustus-training","status":"publish","type":"post","link":"http:\/\/www.chenlianfu.com\/?p=1260","title":{"rendered":"Augustus training"},"content":{"rendered":"<p><a href=\"http:\/\/bioinf.uni-greifswald.de\/augustus\/binaries\/tutorial\/training.html\" target=\"_blank\">Augustus training<\/a><\/p>\n<h1>1. Convert GFF file to Genbank format file<\/h1>\n<pre>\r\n$ $AugustusHome\/scripts\/gff2gbSmallDNA.pl PASA.gff genome.fa 1000 genes.raw.gb\r\n<\/pre>\n<p>\u5c06gff\u6587\u4ef6\u8f6c\u6362\u6210genebank\u683c\u5f0f\uff0c\u5de6\u53f3\u4fa7\u7ffc\u5404\u52a01000bp\u5e8f\u5217\u3002gff\u6587\u4ef6\u53ef\u4ee5\u7531PASA\u5c06RNA-Seq\u7684\u8f6c\u5f55\u5b50\u6bd4\u5bf9\u5230genome\u5f97\u5230\u3002\u800cPASA\u5f97\u5230\u7684gff\u6587\u4ef6\u662f\u67095&#8217;\u7aef\u975e\u7ffb\u8bd1\u533a\u6ce8\u91ca\u7684\uff0c\u8fd9\u6837\u7684\u4fe1\u606f\u4f1a\u88abtrainig\u5ffd\u7565\u3002it is sufficient to have only the coding parts of the gene structure (CDS).<\/p>\n<p>\u5f53\u7136\uff0cgenebank\u6587\u4ef6\u4e5f\u53ef\u4ee5\u4f7f\u7528NCBI\u7684nucleotide\u6570\u636e\u5e93\u8fdb\u884c\u68c0\u7d22\u5f97\u5230\u3002<\/p>\n<h1>2. remove these problematic locis from genes.raw.gb<\/h1>\n<pre>\r\n$ $AugustusHome\/bin\/etainig --species=SPECIES --stopCodonExcludedFromCDS=false genes.raw.gb 2> train.err\r\n$ cat train.err | perl -pe 's\/.*in sequence (\\S+): .*\/$1\/' > badgenes.lst\r\n$ $AugustusHome\/scripts\/filterGenes.pl badgenes.lst genes.raw.gb > genes.gb\r\n<\/pre>\n<p>\u7b2c\u4e00\u6761\u547d\u4ee4\u7528\u4e8e\u8f93\u51fatrainig\u8fc7\u7a0b\u4e2d\u7684\u9519\u8bef\u4fe1\u606f\uff0c\u6839\u636e\u9519\u8bef\u4fe1\u606f\u627e\u5230 badgenes\uff0c\u7136\u540e\u5728\u53bb\u6389\u8fd9\u4e9bbadgenes\uff0c\u5269\u4e0b\u7684genes\u7528\u4e8etraining\u3002<\/p>\n<p>\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff1a<\/p>\n<p>1. \u81f3\u5c11\u6709200\u4e2agene structures\u7528\u4e8etraining\uff0c\u624d\u80fd\u5f97\u5230\u4e0d\u9519\u7684\u7ed3\u679c\u3002\u8d8a\u591a\u7684gene\uff0c\u5219training\u7684\u6548\u679c\u8d8a\u597d\uff1b\u5f53\u7136\uff0c\u8fbe\u52301000\u4e2agenes\u7684\u65f6\u5019\uff0c\u63d0\u5347\u7684\u6548\u679c\u5c31\u5f88\u5c0f\u4e86\u3002<\/p>\n<p>2. \u5f53\u6709\u591a\u4e8e1000\u4e2a\u57fa\u56e0\u7684\u65f6\u5019\uff0c\u5219\u9700\u8981\u6ce8\u91cd\u57fa\u56e0\u7684\u8d28\u91cf\uff0c\u800c\u4e0d\u662f\u6570\u91cf\u4e86\u3002\u8981\u4fdd\u8bc1multi-exon genes\u7684\u6570\u76ee\u8981\u591a\uff0c\u8fd9\u6837\u7528\u4e8etain the introns\u3002\u5e76\u4e14gene structures\u8d8a\u7cbe\u786e\u8d8a\u597d\u3002<\/p>\n<p>3. gene set should be non-redundant.\u5982\u679c2\u4e2a\u4e0d\u540c\u7684\u57fa\u56e0\u5e8f\u5217\u7edd\u5927\u90e8\u5206\u7684amino acid sequence\u662f\u4e00\u81f4\u7684\uff0c\u5219\u53bb\u6389\u5176\u4e2d\u4e00\u4e2a\u3002\u63a8\u8350\u7684\u6761\u4ef6\u662f\uff1agene set\u91cc\u9762\u4efb\u610f\u4e24\u4e2agene\u5728amino acid level\u4e0a\u7684identity\u8981\u4e0d\u9ad8\u4e8e80%\u3002\u53ef\u4ee5\u4f7f\u7528blast\u6765\u89e3\u51b3\uff0c\u7531\u4e8e80%\u7684\u9608\u503c\u7b97\u662f\u6bd4\u8f83\u9ad8\u7684\uff0c\u4e00\u822c\u4e5f\u5c31\u9700\u8981\u53bb\u9664\u638920\u591a\u4e2a\u57fa\u56e0\u3002<\/p>\n<h1>3. Split gene structure set into training and test set<\/h1>\n<pre>\r\n$ $AugustusHome\/scripts\/randomSplit.pl genes.gb 100\r\n<\/pre>\n<p>\u5c06genes.gb\u5206\u9694\u6210\u4e86genes.gb.test\u548cgenes.gb.train\u4e24\u4e2a\u6587\u4ef6\u3002\u5176\u4e2d\u524d\u8005\u4e3agenes.gb\u4e2d\u968f\u673a\u53d6\u51fa\u7684100\u4e2agenes\uff0c\u540e\u8005\u4e3a\u5269\u4e0b\u7684genes\u3002\u540e\u8005\u5c06\u7528\u4e8e\u4e0d\u505c\u5730traning\u3002<\/p>\n<h1>4. CREATE A META PARAMETERS FILE FOR YOUR SPECIES<\/h1>\n<pre>\r\n$ $AugustusHome\/scripts\/new_species.pl --species=lentinula_edodes\r\n<\/pre>\n<p>\u5047\u5982\u6211\u4eec\u8981\u5efa\u7acb\u9999\u83c7\u7269\u79cd\u7684traning\u53c2\u6570\uff0c\u5219\u4e0a\u547d\u4ee4\u5efa\u7acb\u4e86\u5176\u53c2\u6570\u6587\u4ef6\u548c\u6587\u4ef6\u5939\uff0c\u4e0d\u8fc7\u6587\u4ef6\u5185\u5bb9\u662f\u521d\u59cb\u7684\u3002<\/p>\n<p>\u6ce8\u610f\u7684\u662f\uff0c\u7528\u4e8etraining\u7684gene\u7684\u6700\u540e\u4e00\u4e2aCDS\u7684\u6700\u540e3\u4e2a\u78b1\u57fa\u82e5\u4e0d\u662f\u7ec8\u6b62\u5bc6\u7801\u5b50\uff0c\u5219\u9700\u8981\u624b\u52a8\u4fee\u6539Lentinula_edodes_parameters.cfg\u6587\u4ef6\uff0c\u5c06\u5176\u4e2d\u7684stopCodonExcludedFromCDS\u7531\u9ed8\u8ba4\u7684false\u6539\u4e3atrue\u3002<\/p>\n<h1>5. MAKE AN INITIAL TRAINING<\/h1>\n<pre>\r\n$ $AugustusHome\/bin\/etrainig --species=lentinula_edodes genes.gb.train\r\n$ $AugustusHome\/bin\/augustus --species=lentinula_edodes genes.gb.test | tee firsttest.out\r\n$ grep -A 22 Evalustion firsttest.out\r\n<\/pre>\n<p>\u4f7f\u7528genes.gb.train\u505a\u4e00\u6b21trainig\uff0c\u7136\u540e\u4f7f\u7528genes.gb.test\u6765\u68c0\u6d4btraining\u7684\u7cbe\u786e\u6027\u3002\u5206\u522b\u5728nucleotide\uff0cexon\u548cgene level\u4e0a\u68c0\u6d4b\u5176sensitivity\u548cspecificity\u3002<\/p>\n<p>sensitivity\u8868\u793a\u88ab\u88ab\u68c0\u6d4b\u51fa\u6765\u7684\u767e\u5206\u7387\uff1bspecificity\u8868\u793a\u68c0\u6d4b\u51fa\u6765\u7684nucleotide,exon\u6216gene\u548ctest set\u4e2d\u7684\u5b8c\u5168\u4e00\u81f4\u7684\u767e\u5206\u7387\u3002<\/p>\n<h1>6. RUN THE SCRIPT optimize_augustus.pl<\/h1>\n<pre>\r\n$ $AugustusHome\/scripts\/optimize_augustus.pl --species=lentinula_edodes --cpus=8 genes.gb.train\r\n$ $AugustusHome\/bin\/etrainig --species=lentinula_edodes genes.gb.train\r\n$ $AugustusHome\/bin\/augustus --species=lentinula_edodes genes.gb.test\r\n<\/pre>\n<p>1. optimize_augustus.pl\u6240\u505a\u7684\u4e8b\u60c5\uff1a<\/p>\n<p>\u9ed8\u8ba4\u60c5\u51b5\u4e0b\uff0coptimize_augustus.pl\u5c06genes.gb.train\u4e2d\u7684genes\u968f\u673a\u5206\u62108\u7b49\u4efd\uff0c\u7136\u540e\u4f7f\u7528\u5176\u4e2d\u76847\u4e2a\u7b49\u4efd\u7684genes\u505atraining\uff0c\u53e6\u5916\u76841\u4e2a\u505a\u7cbe\u786e\u6027\u8bc4\u4f30\u3002\u8fd9\u6837\u76f8\u4e92\u4e0b\u6765\uff0c\u5171\u67098\u4e2a\u65b9\u6848\uff0c\u6bcf\u4e2a\u65b9\u6848\u53d61\u4e2a\u7b49\u4efd\u7528\u4e8e\u7cbe\u786e\u6027\u8bc4\u4f30\uff0c\u53e6\u59167\u4e2a\u7528\u4e8etraining\u3002<\/p>\n<p>\u8fdb\u884c\u4e00\u6b21\u968f\u673a\u5206\u914d\u540e\u518d\u8fd0\u884c10\u6b21training\u548c\u7cbe\u786e\u6027\u8bc4\u4f30\uff0c\u5373\u4e3a\u4e00\u6b21\u9884\u6d4b\uff0c\u5f97\u5230\u4e00\u4e2atarget value\u3002\u8be5\u503c\u662f base\uff0cexon\u548cgene level\u4e0asensitivities\u548cspecificities\u7684\u6743\u91cd\u503c\u3002<\/p>\n<p>\u6bcf\u6b21\u9884\u6d4b\uff0c\u5982\u679c\u5f97\u5230\u66f4\u9ad8\u7684target value\uff0c\u5219\u4fee\u6b63\u53c2\u6570\u6587\u4ef6\u4e2d\u7684\u503c\uff1alentinula_edodes_parameters.cfg\u3002<\/p>\n<p>\u9ed8\u8ba4\u4e0b\u53c2\u6570\u6587\u4ef6\u4e2d\u670928\u9879\u53c2\u6570\u9700\u8981\u6309\u4e00\u5b9a\u987a\u5e8f\u8fdb\u884c\u4f18\u5316\uff1b\u4e00\u822c\u60c5\u51b5\u4e0b\u6bcf\u4e2a\u53c2\u6570\u6700\u591a\u8bbe\u7f6e5\u4e2a\u503c\u5404\u8fdb\u884c\u4e00\u6b21\u9884\u6d4b(\u5373\u5bf9\u4e00\u9879\u53c2\u6570\u800c\u8a00\uff0c\u8fd9\u8bbe\u7f6e\u76845\u4e2a\u503c\u5176\u4e2d\u53ef\u80fd\u67091\u4e2a\u503c\u662f\u7528\u4e8e\u4e4b\u524d\u7684\u9884\u6d4b\uff0c\u6545\u6bcf\u4e2a\u53c2\u6570\u4f18\u5316\u9700\u8981\u8fd0\u884c\u6700\u591a5\u6b21\u9884\u6d4b)\uff0c\u53d6\u6700\u5927\u7684target value\u5bf9\u5e94\u7684\u503c\u4e3a\u53c2\u6570\u7684\u503c\uff1b\u5bf9\u6240\u6709\u7684\u53c2\u6570\u8fdb\u884c\u4f18\u5316\u4e00\u6b21\u662f\u4e00\u8f6e\uff0c\u52195\u8f6e\u53c2\u6570\u4f18\u5316\u5b8c\u6bd5\u540e\u7a0b\u5e8f\u4f1a\u505c\u6b62\u8fd0\u884c(\u4ee51800\u5de6\u53f3\u4e2agenes\u6765\u8fdb\u884ctraining\uff0c\u5219\u6bcf\u6b21augustus\u5bf9200\u591a\u4e2agene\u8fdb\u884c\u9884\u6d4b\u9700\u89811min\uff0c\u90a3\u4e48\u6bcf\u4e2a\u53c2\u6570\u4f18\u5316\u9700\u898128*4*8*1min=896min=15h\uff0c5\u8f6e\u53c2\u6570\u7684\u4f18\u5316\u603b\u5171\u9700\u898175h\uff0c\u53733\u5929)\uff0c\u6216\u5982\u679c\u5728\u4e00\u8f6e\u53c2\u6570\u4f18\u5316\u4e2d\u6ca1\u6709improvements\u5219\u63d0\u524d\u505c\u6b62\u8fd0\u884c\u3002\u5f53\u7136\uff0c\u5982\u679c\u7b49\u4e0d\u53ca\uff0c\u4e5f\u80fd\u624b\u52a8\u505c\u6b62\u7a0b\u5e8f\u8fd0\u884c\u3002\u7531\u4e8eoptimize_augustus.pl\u8fd0\u884c\u65f6\u95f4\u592a\u957f\uff0c\u6700\u597d\u4f7f\u7528screen\u6765\u8fd0\u884c\u3002<\/p>\n<p>\u5982\u679c\u4e86\u89e3\u4e86\u4e0a\u8ff0\u8fd0\u884c\u539f\u7406\uff0c\u5219\u53ef\u4ee5\u89c6\u60c5\u5f62\u7ec8\u6b62\u5176\u8fd0\u884c\uff0c\u6216\u4fdd\u5b58\u914d\u7f6e\u6587\u4ef6\u540e\u63a5\u7740\u8fd0\u884c\u3002<\/p>\n<p>2. \u5728optimize_augustus.pl\u5b8c\u6210\u6216\u4e2d\u65ad\u4e4b\u540e\uff0c\u9700\u8981(re)train AUGUSTUS with genes.gb.train\u3002\u7136\u540e\u5728\u4f7f\u7528genes.test.gb\u8fdb\u884c\u9884\u6d4b\u7684\u7cbe\u786e\u6027\u68c0\u6d4b\uff0c\u5982\u679cgene level sensitivity\u4f4e\u4e8e20%\uff0c\u5219\u8868\u660etraining set\u4e0d\u591f\u5927\uff0c\u6216\u8005\u8d28\u91cf\u4e0d\u591f\u597d\uff0c\u6216\u8005\u7269\u79cdsomehow special\u3002<\/p>\n<h1>7. Training AUGUSTUS UTR parameters<\/h1>\n<p>\u8fd9\u90e8\u5206\u7684Training\u5219\u9700\u89815&#8217;\u548c3&#8217;\u7aef\u7684UTR\u90fd\u5b58\u5728\u7684gene structure\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Augustus training 1. Convert GFF file to &hellip; <a href=\"http:\/\/www.chenlianfu.com\/?p=1260\">\u7ee7\u7eed\u9605\u8bfb <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/posts\/1260"}],"collection":[{"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1260"}],"version-history":[{"count":19,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/posts\/1260\/revisions"}],"predecessor-version":[{"id":1970,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/posts\/1260\/revisions\/1970"}],"wp:attachment":[{"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1260"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1260"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1260"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}