{"id":1578,"date":"2013-06-13T19:57:03","date_gmt":"2013-06-13T11:57:03","guid":{"rendered":"http:\/\/www.hzaumycology.com\/chenlianfu_blog\/?p=1578"},"modified":"2013-06-14T16:46:17","modified_gmt":"2013-06-14T08:46:17","slug":"%e7%bb%93%e5%90%88gatk%e5%92%8csamtools%e4%bb%8e%e5%a4%b4%e6%8c%96%e6%8e%98snps%e5%92%8cindels","status":"publish","type":"post","link":"http:\/\/www.chenlianfu.com\/?p=1578","title":{"rendered":"\u7ed3\u5408GATK\u548cSAMtools\u4ece\u5934\u6316\u6398SNPs\u548cINDELs"},"content":{"rendered":"<h1>\u63d0\u51fa\u95ee\u9898<\/h1>\n<p>\u7ed9\u51fa\u4ee5\u4e0b\u7684\u5df2\u6709\u7ed3\u679c\uff1a<br \/>\n1. \u7269\u79cdspecies\u7684\u57fa\u56e0\u7ec4fasta\u6587\u4ef6: genome.fasta<br \/>\n2. \u5bf9\u8be5\u7269\u79cd\u7684\u4e00\u4e2a\u540d\u4e3asample\u7684\u57fa\u56e0\u7ec4DNA\u6837\uff0c\u8fdb\u884c\u4e86Illumina hiseq2000 paired-end\u6d4b\u5e8f(300bp\u63d2\u5165\u7247\u6bb5\u6587\u5e93\uff0c~20x\u57fa\u56e0\u7ec4\u78b1\u57fa\u8986\u76d6\u5ea6)\uff0c\u7ed3\u679c\u6587\u4ef6\uff1a reads1.fq, reads2.fq<br \/>\n\u95ee\u600e\u4e48\u8fdb\u884cSNPs\u548cINDELs\u7684calling\uff1f<\/p>\n<p>\u6587\u7ae0\u6709\u5f88\u591a\u5730\u65b9\u5f15\u81eaNowind\u7684\u535a\u6587\uff1a<a href=\"http:\/\/blog.sina.com.cn\/s\/blog_6721167201018jik.html\" target=\"_blank\">GATK\u4f7f\u7528\u7b80\u4ecb Part2\/2<\/a><br \/>\n\u6d41\u7a0b\u53c2\u7167\uff1a<a href=\"http:\/\/www.broadinstitute.org\/gatk\/guide\/topic?name=best-practices\" target=\"_blank\">http:\/\/www.broadinstitute.org\/gatk\/guide\/topic?name=best-practices<\/a><\/p>\n<p>\u4ee5\u4e0b\u7ed9\u51fa\u5177\u4f53\u7684\u7b54\u6848\u548c\u6b65\u9aa4, \u4ee5\u4e0a\u8ff0\u5df2\u6709\u7ed3\u679c\u76843\u4e2a\u6587\u4ef6\u6240\u5728\u7684\u76ee\u5f55\u4e3a\u5f53\u524d\u5de5\u4f5c\u76ee\u5f55\uff0c\u5404\u79cd\u8f6f\u4ef6\u7684\u4e3b\u76ee\u5f55\u4ee5$software\u8868\u793a\u3002\u591a\u7ebf\u7a0b\u7684\u7a0b\u5e8f\u4ee58\u4e2a\u7ebf\u7a0b\u8fd0\u884c\u3002<\/p>\n<h1>1. Raw reads\u7684\u5904\u7406<\/h1>\n<p>\u4f7f\u7528NGSQC toolkit\u4f7f\u7528\u9ed8\u8ba4\u8bbe\u7f6e\u5bf9raw reads\u8fdb\u884c\u8fc7\u6ee4\u3002<\/p>\n<pre>\r\n$ perl $NGSQCToolkitHome\/QC\/IlluQC_PRLL.pl \\\r\n  -pe reads1.fq reads2.fq 2 5 -c 8 \\\r\n  -o QC\r\n<\/pre>\n<h1>2. \u5c06\u8fc7\u6ee4\u540e\u7684reads\u6bd4\u5bf9\u5230\u57fa\u56e0\u7ec4\u4e0a<\/h1>\n<pre>\r\n$ bowtie2-build genome.fasta genome\r\n$ bowtie2 --rg-id sample --rg \"PL:ILLUMINA\" --rg \"SM:sample\" \\\r\n  -x geome -1 .\/QC\/reads1.fq_filtered -2 .\/QC\/reads2.fq.filtered \\\r\n  -p 8 -S sample.sam\r\n<\/pre>\n<h1>3. \u5c06sam\u6587\u4ef6\u8f6c\u6362\u4e3aBam\u6587\u4ef6\u5e76\u6807\u8bb0\u51faPCR duplicates<\/h1>\n<pre>\r\n$ samtools view -bS sample.sam > sample.bam\r\n$ java -Xmx10g -jar $picardHome\/SortSam.jar \\\r\n  INPUT=sample.bam OUTPUT=sample.sort.bam \\\r\n  SORT_ORDER=coordinate\r\n$ java -Xmx10g -jar $picardHome\/MarDuplicates.jar \\\r\n  INPUT=sample.sort.bam OUTPUT=sample.dd.bam \\\r\n  METRICS_FILE=sample.dd.metrics \\\r\n  MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=1000\r\n<\/pre>\n<h1>4. \u6784\u5efa\u7d22\u5f15\u6587\u4ef6<\/h1>\n<pre>\r\n$ samtools faidx genome.fasta\r\n$ java -Xmx10g -jar $picardHome\/CreateSequenceDictionary.jar \\\r\n  R=genome.fasta O=genome.dict\r\n$ samtools index sample.dd.bam\r\n<\/pre>\n<h1>5. \u5bf9INDEL\u5468\u56f4\u8fdb\u884c\u91cd\u65b0\u6bd4\u5bf9(realignment)<\/h1>\n<pre>\r\n$ java -Xmx10g -jar $GATKHome\/GenomeAnalysisTK.jar \\\r\n  -R genome.fasta -T RealignerTargetCreator \\\r\n  -I sample.dd.bam -o sample.realn.intervals\r\n$ java -Xmx10g -jar $GATKHome\/GenomeAnalysisTK.jar \\\r\n  -R genome.fasta -T IndelRealigner \\\r\n  -targetIntervals sample.realn.intervals \\\r\n  -I sample.dd.bam -o sample.realn.bam\r\n<\/pre>\n<h1>6. \u7b2c1\u904dVariant calling<\/h1>\n<h2>6.1 \u4f7f\u7528GATK\u548csamtools\u8fdb\u884cSNP\u548cINDEL calling<\/h1>\n<pre>\r\n$ java -Xmx10g -jar $GATKHome\/GenomeAnalysisTK.jar \\\r\n  -R genome.fasta -T UnifiedGenotyper \\\r\n  -I sample.realn.bam -o sample.gatk.raw1.vcf \\\r\n  --read_filter BadCigar -glm BOTH \\\r\n  -stand_call_conf 30.0 -stand_emit_conf 0\r\n$ samtools mpileup -DSugf genome.fasta sample.realn.bam | \\\r\n  bcftools view -Ncvg - > sample.samtools.raw1.vcf\r\n<\/pre>\n<h2>6.2 \u5bf9Variant\u7ed3\u679c\u8fdb\u884c\u7efc\u5408\u548c\u8fc7\u6ee4<\/h2>\n<pre>\r\n$ java -Xmx10g -jar $GATKHome\/GenomeAnalysisTK.jar \\\r\n  -R genome.fasta -T SelectVariants \\\r\n  --variant sample.gatk.raw1.vcf \\\r\n  --concordance sample.samtools.raw1.vcf \\\r\n  -o sample.concordance.raw1.vcf\r\n$ MEANQUAL=`awk '{ if ($1 !~ \/#\/) { total += $6; count++ } } \\\r\n  END { print total\/count }' sample.concordance.raw1.vcf`\r\n$ java -Xmx10g -jar $GATKHome\/GenomeAnalysisTK.jar \\\r\n  -R genome.fasta -T VariantFiltration \\\r\n  --filterExpression \"QD < 20.0 || ReadPosRankSum < -8.0 || \\\r\n  FS > 10.0 || QUAL < $MEANQUAL\" \\\r\n  --filterName LowQualFilter --variant sample.concordance.raw1.vcf \\\r\n  --missingValuesInExpressionsShouldEvaluateAsFailing\r\n  --logging_level ERROR -o sample.concordance.flt1.vcf\r\n$ grep -v \"Filter\" sample.concordance.flt1.vcf > \\\r\n  sample.concordance.filter1.vcf\r\n<\/pre>\n<h1>7. \u5bf9\u7b2c5\u6b65\u83b7\u5f97\u7684realign\u7684bam\u6587\u4ef6\u7684\u8fdb\u884c\u6821\u6b63<\/h1>\n<pre>\r\n$ java -Xmx10g -jar $GATKHome\/GenomeAnalysisTK.jar \\\r\n  -R genome.fasta -T BaseRecalibrator \\\r\n  -I sample.realn.bam -o sample.recal_data.grp \\\r\n  -knownSites sample.concordance.filter1.vcf\r\n$ java -Xmx10g -jar $GATKHome\/GenomeAnalysisTK.jar \\\r\n  -R genome.fasta -T PrintReads \\\r\n  -I sample.realn.bam -o sample.recal.bam \\\r\n  -BQSR sample.recal_data.grp\r\n<\/pre>\n<h1>8. \u7b2c2\u904dVariant calling<\/h1>\n<pre>\r\n$ java -Xmx10g -jar $GATKHome\/GenomeAnalysisTK.jar \\\r\n  -R genome.fasta -T UnifiedGenotyper \\\r\n  -I sample.recal.bam -o sample.gatk.raw2.vcf \\\r\n  --read_filter BadCigar -glm BOTH \\\r\n  -stand_call_conf 30.0 -stand_emit_conf 0\r\n$ samtools mpileup -DSugf genome.fasta sample.recal.bam | \\\r\n  bcftools view -Ncvg - > sample.samtools.raw2.vcf\r\n$ java -Xmx10g -jar $GATKHome\/GenomeAnalysisTK.jar \\\r\n  -R genome.fasta -T SelectVariants \\\r\n  --variant sample.gatk.raw2.vcf \\\r\n  --concordance sample.samtools.raw2.vcf \\\r\n  -o sample.concordance.raw2.vcf\r\n$ java -Xmx10g -jar $GATKHome\/GenomeAnalysisTK.jar \\\r\n  -R genome.fasta -T VariantFiltration \\\r\n  --filterExpression \"QD < 10.0 || ReadPosRankSum < -8.0 || \\\r\n  FS > 10.0 || QUAL < 30\" \\\r\n  --filterName LowQualFilter --variant sample.concordance.raw2.vcf \\\r\n  --missingValuesInExpressionsShouldEvaluateAsFailing\r\n  --logging_level ERROR -o sample.concordance.flt2.vcf\r\n$ grep -v \"Filter\" sample.concordance.flt2.vcf > \\\r\n  sample.concordance.filter2.vcf\r\n$ cp sample.concordance.filter2.vcf sample.final.vcf\r\n<\/pre>\n<p>\u6700\u540e\u7684\u7ed3\u679c\u6587\u4ef6\u4e3asample.final.vcf\u3002\u5173\u4e8e\u5177\u4f53\u7684vcf\u7684\u683c\u5f0f\u8be6\u89e3\u89c1\uff1a<a href=\"http:\/\/www.hzaumycology.com\/chenlianfu_blog\/?p=1514\" target=\"_blank\">http:\/\/www.hzaumycology.com\/chenlianfu_blog\/?p=1514<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u63d0\u51fa\u95ee\u9898 \u7ed9\u51fa\u4ee5\u4e0b\u7684\u5df2\u6709\u7ed3\u679c\uff1a 1. \u7269\u79cdspecies\u7684\u57fa\u56e0\u7ec4fasta\u6587\u4ef6: &hellip; <a href=\"http:\/\/www.chenlianfu.com\/?p=1578\">\u7ee7\u7eed\u9605\u8bfb <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[],"tags":[],"_links":{"self":[{"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/posts\/1578"}],"collection":[{"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1578"}],"version-history":[{"count":21,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/posts\/1578\/revisions"}],"predecessor-version":[{"id":2021,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=\/wp\/v2\/posts\/1578\/revisions\/2021"}],"wp:attachment":[{"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1578"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1578"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.chenlianfu.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1578"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}