AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences.

1. Augustus的安装

Augustus下载：http://bioinf.uni-greifswald.de/augustus/binaries/

$ wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus.2.7.tar.gz
$ tar zxf augustus.2.7.tar.gz
$ cd augustus.2.7
$ cd src
$ make -j 8
$ export AUGUSTUS_CONFIG_PATH=$PWD/../config/ (可以加入到.bashrc中）

2. Augustus使用方法

2.1 基因预测例子

$ augustus --strand=both --genemode=partial --singlestrand=false --hintsfile=hints.gff --extrinsicCfgFile=extrinsic.cfg --protein=on --introns=on --start=on --stop=on --cds=on --codingseq=on --alternatives-from-evidence=true --gff3=on --UTR=on ----outfile=out.gff --species=human genome.fa
$ augustus --noprediction=true --species=SPECIES sequences.gb

2.2 Augustus使用参数

Usage:

augustus [parameters] --sepcies=SPECIES queryfilename

重要参数：

--strand=both, --strand=forward or --strand=backward
report predicted genes on both strands, just the forward or
just the backward strand.default is 'both'

--genemodel=partial, --genemodel=intronless, --genemodel=complete,
--genemodel=atleastone or --genemodel=exactlyone
partial : allow prediction of incomplete genes at the sequence boundaries (default)
intronless : only predict single-exon genes like in prokaryotes and some eukaryotes
complete : only predict complete genes
atleastone : predict at least one complete gene
exactlyone : predict exactly one complete gene

--singlestrand=true
predict genes independently on each strand, allow overlapping
genes on opposite strands. This option is turned off by default.

--hintsfile=hintsfilename
When this option is used the prediction considering hints (ex
trinsic information) is turned on. hintsfilename contains the hints
in gff format.

--extrinsicCfgFile=cfgfilename
Optional. This file contains the list of used sources for the
hints and their boni and mali. If not specified the file "extrin
sic.cfg" in the config directory $AUGUSTUS_CONFIG_PATH is used.

--maxDNAPieceSize=n
This value specifies the maximal length of the pieces that the
sequence is cut into for the core algorithm (Viterbi) to be run.
Default is --maxDNAPieceSize=200000.
AUGUSTUS tries to place the boundaries of these pieces in the
intergenic region, which is inferred by a preliminary prediction.
GC-content dependent parameters are chosen for each piece of DNA
if /Constant/decomp_num_steps > 1 for that species. This is why
this value should not be set very large, even if you have plenty
of memory.

--protein=on/off
--introns=on/off
--start=on/off
--stop=on/off
--cds=on/off
--codingseq=on/off
Output options. Output predicted protein sequence, introns,
start codons, stop codons. Or use 'cds' in addition to 'initial',
'internal', 'terminal' and 'single' exon. The CDS excludes the
stop codon (unless stopCodonExcludedFromCDS=false) whereas the
terminal and single exon include the stop codon.

--AUGUSTUS_CONFIG_PATH=path
path to config directory (if not specified as environment var
iable)

--alternatives-from-evidence=true/false
report alternative transcripts when they are suggested by hints

--alternatives-from-sampling=true/false
report alternative transcripts generated through probabilistic
sampling

--sample=n
--minexonintronprob=p
--minmeanexonintronprob=p
--maxtracks=n

--proteinprofile=filename
Read a protein profile from file filename. See section 7 below.

--predictionStart=A, --predictionEnd=B
A and B define the range of the sequence for which predictions
should be found. Quicker if you need predictions only for a small
part.

--gff3=on/off
output in gff3 format.

--UTR=on/off
predict the untranslated regions in addition to the coding
sequence. This currently works only for human, galdieria, toxopl
asma and caenorhabditis.

--outfile=filename
print output to filename instead to standard output. This is
useful for computing environments, e.g. parasol jobs, which do
not allow shell redirection.

--noInFrameStop=true/false
Don't report transcripts with in-frame stop codons. Otherwise,
intron-spanning stop codons could occur. Default: false

--noprediction=true/false
If true and input is in genbank format, no prediction is made.
Useful for getting the annotated protein sequences. Augustus也可以以
genebank格式文件为输入文件，进行基因预测，并将预测结果和genebank的结果进行比较后
得出一个精确性的统计结果。
当然，由于genebank格式文件中有些sequences没有cds的注释结果，因此可以使用该
参数进行检测，从而得到没有cds的序列号，在人为去去除这些没有cds注释的序列，再去进行
预测准确性的评估。

--contentmodels=on/off
If 'off' the content models are disabled (all emissions unif
ormly 1/4). The content models are; coding region Markov chain
(emiprobs), initial k-mers in coding region (Pls), intron and int
ergenic regin Markov chain. This option is intended for special
applications that require judging gene structures from the signal
models only, e.g. for predicting the effect of SNPs or mutations
on splicing. For all typical gene predictions, this should be
true. Default: on

--paramlist
For a complete list of parameters, type "augustus --paramlist"

陈连福的生信博客

第22期培训班将于2024.01.27-2024.02.05期间在武汉市举办！

Augustus的安装和使用参数

1. Augustus的安装

2. Augustus使用方法

2.1 基因预测例子

2.2 Augustus使用参数

发表评论取消回复

1. Augustus的安装

2. Augustus使用方法

2.1 基因预测例子

2.2 Augustus使用参数

发表评论 取消回复

发表评论取消回复