Skip to content

Bakta

The bakta module uses Bakta to rapidly annotate bacterial genomes and plasmids in a standardized fashion. Bakta makes use of a large database (40+ GB) to provide extensive annotations including: tRNA, tmRNA, rRNA, ncRNA, CRISPR, CDS, and sORFs.

Output Overview

Below is the default output structure for the bakta step in Bactopia. Where possible the file descriptions below were modified from a tools description.

<BACTOPIA_DIR>
├── <SAMPLE_NAME>
│   └── main
│       └── annotator
│           └── bakta
│               ├── <SAMPLE_NAME>-blastdb.tar.gz
│               ├── <SAMPLE_NAME>.embl.gz
│               ├── <SAMPLE_NAME>.faa.gz
│               ├── <SAMPLE_NAME>.ffn.gz
│               ├── <SAMPLE_NAME>.fna.gz
│               ├── <SAMPLE_NAME>.gbff.gz
│               ├── <SAMPLE_NAME>.gff3.gz
│               ├── <SAMPLE_NAME>.hypotheticals.faa.gz
│               ├── <SAMPLE_NAME>.hypotheticals.tsv
│               ├── <SAMPLE_NAME>.tsv
│               ├── <SAMPLE_NAME>.txt
│               └── logs
│                   ├── nf-bakta.{begin,err,log,out,run,sh,trace}
│                   └── versions.yml
└── bactopia-runs
    └── bakta-<TIMESTAMP>
        └── nf-reports
            ├── bakta-dag.dot
            ├── bakta-report.html
            ├── bakta-timeline.html
            └── bakta-trace.txt

Results

Bakta

Below is a description of the per-sample results from Bakta.

Extension Description
.blastdb.tar.gz A gzipped tar archive of BLAST+ database of the contigs, genes, and proteins
.embl.gz Annotations & sequences in (multi) EMBL format
.faa.gz CDS/sORF amino acid sequences as FASTA
.ffn.gz Feature nucleotide sequences as FASTA
.fna.gz Replicon/contig DNA sequences as FASTA
.gbff.gz Annotations & sequences in (multi) GenBank format
.gff3.gz Annotations & sequences in GFF3 format
.hypotheticals.faa.gz Hypothetical protein CDS amino acid sequences as FASTA
.hypotheticals.tsv Further information on hypothetical protein CDS as simple human readable tab separated values
.tsv Annotations as simple human readable tab separated values
.txt Broad summary of Bakta annotations

Audit Trail

Below are files that can assist you in understanding which parameters and program versions were used.

Logs

Each process that is executed will have a folder named logs. In this folder are helpful files for you to review if the need ever arises.

Extension Description
.begin An empty file used to designate the process started
.err Contains STDERR outputs from the process
.log Contains both STDERR and STDOUT outputs from the process
.out Contains STDOUT outputs from the process
.run The script Nextflow uses to stage/unstage files and queue processes based on given profile
.sh The script executed by bash for the process
.trace The Nextflow Trace report for the process
versions.yml A YAML formatted file with program versions

Parameters

Bakta Download

Parameter Description
--bakta_db Tarball or path to the Bakta database
Type: string
--bakta_db_type Which Bakta DB to download 'full' (~30GB) or 'light' (~2GB)
Type: string, Default: full
--bakta_save_as_tarball Save the Bakta database as a tarball
Type: boolean
--download_bakta Download the Bakta database to the path given by --bakta_db
Type: boolean

Bakta

Parameter Description
--proteins FASTA file of trusted proteins to first annotate from
Type: string
--prodigal_tf Training file to use for Prodigal
Type: string
--replicons Replicon information table (tsv/csv)
Type: string
--min_contig_length Minimum contig size to annotate
Type: integer, Default: 1
--keep_contig_headers Keep original contig headers
Type: boolean
--compliant Force Genbank/ENA/DDJB compliance
Type: boolean
--skip_trna Skip tRNA detection & annotation
Type: boolean
--skip_tmrna Skip tmRNA detection & annotation
Type: boolean
--skip_rrna Skip rRNA detection & annotation
Type: boolean
--skip_ncrna Skip ncRNA detection & annotation
Type: boolean
--skip_ncrna_region Skip ncRNA region detection & annotation
Type: boolean
--skip_crispr Skip CRISPR array detection & annotation
Type: boolean
--skip_cds Skip CDS detection & annotation
Type: boolean
--skip_sorf Skip sORF detection & annotation
Type: boolean
--skip_gap Skip gap detection & annotation
Type: boolean
--skip_ori Skip oriC/oriT detection & annotation
Type: boolean
--bakta_opts Extra Backa options in quotes. Example: '--gram +'
Type: string

Citations

If you use Bactopia and bakta in your analysis, please cite the following.