Prokka

The prokka module uses Prokka to rapidly annotate bacterial genomes in a standardized fashion.

Output Overview¶

Below is the default output structure for the prokka step in Bactopia. Where possible the file descriptions below were modified from a tools description.

<BACTOPIA_DIR>
├── <SAMPLE_NAME>
│   └── main
│       └── annotator
│           └── prokka
│               ├── <SAMPLE_NAME>-blastdb.tar.gz
│               ├── <SAMPLE_NAME>.faa.gz
│               ├── <SAMPLE_NAME>.ffn.gz
│               ├── <SAMPLE_NAME>.fna.gz
│               ├── <SAMPLE_NAME>.fsa.gz
│               ├── <SAMPLE_NAME>.gbk.gz
│               ├── <SAMPLE_NAME>.gff.gz
│               ├── <SAMPLE_NAME>.sqn.gz
│               ├── <SAMPLE_NAME>.tbl.gz
│               ├── <SAMPLE_NAME>.tsv
│               ├── <SAMPLE_NAME>.txt
│               └── logs
│                   ├── <SAMPLE_NAME>.{err|log}
│                   ├── nf-prokka.{begin,err,log,out,run,sh,trace}
│                   └── versions.yml
└── bactopia-runs
    └── prokka-<TIMESTAMP>
        └── nf-reports
            ├── prokka-dag.dot
            ├── prokka-report.html
            ├── prokka-timeline.html
            └── prokka-trace.txt

Results¶

Prokka¶

Below is a description of the per-sample results from Prokka.

Extension	Description
.blastdb.tar.gz	A gzipped tar archive of BLAST+ database of the contigs, genes, and proteins
.faa.gz	Protein FASTA file of the translated CDS sequences.
.ffn.gz	Nucleotide FASTA file of all the prediction transcripts (CDS, rRNA, tRNA, tmRNA, misc_RNA)
.fna.gz	Nucleotide FASTA file of the input contig sequences.
.gbk.gz	This is a standard GenBank file derived from the master .gff. If the input to prokka was a multi-FASTA, then this will be a multi-GenBank, with one record for each sequence.
.gff.gz	This is the master annotation in GFF3 format, containing both sequences and annotations. It can be viewed directly in Artemis or IGV.
.sqn.gz	An ASN1 format "Sequin" file for submission to GenBank. It needs to be edited to set the correct taxonomy, authors, related publication etc.
.tbl.gz	Feature Table file, used by "tbl2asn" to create the .sqn file.
.tsv	Tab-separated file of all features (locus_tag,ftype,len_bp,gene,EC_number,COG,product)
.txt	Statistics relating to the annotated features found.

Audit Trail¶

Below are files that can assist you in understanding which parameters and program versions were used.

Logs¶

Each process that is executed will have a folder named logs. In this folder are helpful files for you to review if the need ever arises.

Extension	Description
.begin	An empty file used to designate the process started
.err	Contains STDERR outputs from the process
.log	Contains both STDERR and STDOUT outputs from the process
.out	Contains STDOUT outputs from the process
.run	The script Nextflow uses to stage/unstage files and queue processes based on given profile
.sh	The script executed by bash for the process
.trace	The Nextflow Trace report for the process
versions.yml	A YAML formatted file with program versions

Parameters¶

Prokka¶

Parameter	Description
`--proteins`	FASTA file of trusted proteins to first annotate from Type: `string`
`--prodigal_tf`	Training file to use for Prodigal Type: `string`
`--compliant`	Force Genbank/ENA/DDJB compliance Type: `boolean`
`--centre`	Sequencing centre ID Type: `string`, Default: `Bactopia`
`--prokka_coverage`	Minimum coverage on query protein Type: `integer`, Default: `80`
`--prokka_evalue`	Similarity e-value cut-off Type: `string`, Default: `1e-09`
`--prokka_opts`	Extra Prokka options in quotes. Type: `string`

Citations¶

If you use Bactopia and prokka in your analysis, please cite the following.

Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020)
Prokka
Seemann T Prokka: rapid prokaryotic genome annotation Bioinformatics 30, 2068–2069 (2014)