Using Bactopia with AllTheBacteria Assemblies¶
AllTheBacteria (ATB) is a collection of nearly 2,000,000 bacterial assemblies. In this post you'll learn how to use Bactopia to seamlessly analyze these assemblies with the available Bactopia Tools.
AllTheBacteria¶
Zamin Iqbal's Group, who brought us 661k bacterial assemblies, has now taken it a step further with AllTheBacteria. As someone once tasked with assembling "all the Staphylococcus aureus genomes" (although, it was only about 700 samples in 2010!), this is truly an impressive feat, and a valuable community resource! With the latest assemblies, the collection is now nearly 2,000,000 bacterial assemblies! 🎉
Similar to their previous methods, the latest version of AllTheBacteria uses Shovill for assembly. In addition, each assembly has basic metrics calculated, undergoes taxonomic abundance estimation, and has sketches made available. For more details about this project, please see:
- Preprint: AllTheBacteria - all bacterial genomes assembled, available and searchable
- GitHub: AllTheBacteria
Since Zamin revealed the latest updates on AllTheBacteria, I've been wondering: How could Bactopia users take advantage these assemblies? Especially, through available Bactopia Tools?
Why Bactopia Tools?¶
The really nice thing about Bactopia Tools is they make it super easy to run 60 additional analyses
on your genomes. It's really as simple as adding --wf <tool>
to your Bactopia command, then Bactopia
will then handle the rest for you, including container selection and audit trails.
Obviously, I'm a bit biased here, but utilizing Bactopia Tools in this situation would greatly streamline a lot of downstream analyses of AllTheBacteria assemblies. I think this would allow researchers to quickly get to the science behind these assemblies.
To give you an idea, there are currently 38 Bactopia Tools that use assemblies as inputs. In other words, each of these tools would be easy to run on the 2,000,000 AllTheBacteria assemblies.
Expand to see the list of Bactopia Tools
Each of the tools listed below accepts a single assembly as input.
Tool | Description |
---|---|
bakta | Rapid annotation of bacterial genomes & plasmids |
fastani | Fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI) |
gtdb | Identify marker genes and assign taxonomic classifications |
mashtree | Create a trees using Mash distances |
abricate | Mass screening of contigs for antimicrobial and virulence genes |
abritamr | A NATA accredited tool for reporting the presence of antimicrobial resistance genes |
agrvate | Rapid identification of Staphylococcus aureus agr locus type and agr operon variants |
amrfinderplus | Identify antimicrobial resistance in genes or proteins |
btyper3 | Taxonomic classification of Bacillus cereus group isolates |
busco | Assembly completeness based on evolutionarily informed expectations |
checkm | Assess the assembly quality of your microbial samples |
ectyper | In-silico prediction of Escherichia coli serotype |
emmtyper | emm-typing of Streptococcus pyogenes assemblies |
gamma | Identification, classification, and annotation of translated gene matches |
hicap | Identify cap locus serotype and structure in your Haemophilus influenzae assemblies |
hpsuissero | Rapid Haemophilus parasuis Serotyping of assemblies |
kleborate | Screen for MLST, sub-species, and other Klebsiella related genes of interest |
legsta | Typing of Legionella pneumophila assemblies |
lissero | Serogroup typing prediction for Listeria monocytogenes |
mashdist | Calculate Mash distances between sequences |
mcroni | Sequence variation in mobilized colistin resistance (mcr-1) genes |
meningotype | Serotyping of Neisseria meningitidis |
mlst | Scan contig files against PubMLST typing schemes |
mobsuite | Reconstruct and annotate plasmids in bacterial assemblies |
pasty | Serogrouping of Pseudomonas aeruginosa isolates |
pbptyper | Penicillin Binding Protein (PBP) typer for Streptococcus pneumoniae |
phispy | Predict prophages in bacterial genomes |
plasmidfinder | Plasmid identification from assemblies |
prokka | Whole genome annotation of small genomes (bacterial, archeal, viral) |
quast | Assess the quality of assembled contigs |
rgi | Predict antibiotic resistance from assemblies |
seqsero2 | Salmonella serotype prediction from reads or assemblies |
shigeifinder | Shigella and EIEC serotyping from assemblies |
sistr | Serovar prediction of Salmonella assemblies |
spatyper | Computational method for finding spa types in Staphylococcus aureus |
staphopiasccmec | Primer based SCCmec typing of Staphylococcus aureus genomes |
stecfinder | Serotyping Shigella toxin producing Escherichia coli genomes |
ssuissero | Rapid Streptococcus suis Serotyping of assemblies |
Bactopia Tools require samples processed with Bactopia
One of the key features of Bactopia Tools, is they utilize Bactopia outputs to rapidly identify and begin analysis. AllTheBacteria assemblies were not processed by Bactopia, so they aren't compatible with Bactopia Tools. But, no worries, with a little work we can make this a possibility!
bactopia atb-formatter
¶
Bactopia already allows assemblies as inputs, but I didn't want users to have to go through the
full Bactopia pipeline to use the Bactopia Tools. Instead, I wanted to make a quick and easy way
for users to go directly to using Bactopia Tools. To accomplish this, I created a new Bactopia
command called atb-formatter
(AllTheBacteria Formatter). With atb-formatter
, the necessary Bactopia output directory
structure will be created from a directory of _AllTheBacteria assemblies.
AllTheBacteria assemblies can be used with Bactopia Tools!
That's cool and all, but let's actually demonstrate the usage of atb-formatter
on some
Legionella pneumophila assemblies from AllTheBacteria.
Example Usage for Legionella pneumophila¶
To demonstrate the usage of bactopia atb-formatter
, I will use assemblies for
Legionella pneumophila from AllTheBacteria and run legsta,
a typing tool for L. pneumophila assemblies, written by Torsten Seeman,
To be specific, I will run legsta from the available Bactopia Tool.
Getting Setup¶
Before we get started, you'll need to have Bactopia installed. If you haven't done this yet, please see the installation instructions.
You will also want to make sure you are using at least version 3.0.1 of Bactopia, as this is
the first release to have the atb-formatter
command.
Download the Assemblies¶
First I will download the L. pneumophila assemblies from AllTheBacteria, then extract
them into a folder called legionella-assemblies
. Easy enough!
mkdir atb-legionella
cd atb-legionella
# Download the assemblies
wget https://ftp.ebi.ac.uk/pub/databases/AllTheBacteria/Releases/0.1/assembly/legionella_pneumophila__01.asm.tar.xz
wget https://ftp.ebi.ac.uk/pub/databases/AllTheBacteria/Releases/0.1/assembly/legionella_pneumophila__02.asm.tar.xz
# Extract the assemblies
mkdir legionella-assemblies
tar -C legionella-assemblies -xJf legionella_pneumophila__01.asm.tar.xz
tar -C legionella-assemblies -xJf legionella_pneumophila__02.asm.tar.xz
At the time of writing this, there were 5,393 L. pneumophila assemblies available from
AllTheBacteria. While its not Salmonella enterica with it's hundreds of thousands of assemblies,
it's a great number to demonstrate the usage of bactopia atb-formatter
.
Create the Bactopia Directory Structure¶
With the assemblies extracted, now I need to create the required Bactopia directory to make
use of Bactopia Tools. For this, I used bactopia atb-formatter
, which creates a sample folder
for each assembly that matches the BioSample accession.
# Create the Bactopia directory structure
bactopia atb-formatter --path legionella-assemblies --recursive
A few notes about bactopia atb-formatter
Please note the usage of --recursive
here, this will traverse the legionella-assemblies
directory
to find all assemblies contained. At this point, the bactopia
directory structure has been
created for 5,393 assemblies and is ready for use with Bactopia Tools.
Also, by default the assemblies are not copied into the Bactopia directory structure, but
instead symbolic links are created. This is to save disk space, but if you would like to
copy the assemblies, you can use the --publish-mode
parameter to change this behavior
After running the above command, you should see something like the following:
2024-03-22 14:30:07:root:INFO - Setting up Bactopia directory structure (use --verbose to see more details)
2024-03-22 14:30:08:root:INFO - Bactopia directory structure created at bactopia
2024-03-22 14:30:08:root:INFO - Total assemblies processed: 5393
Use Bactopia to run Legsta¶
Fancy! Now we have all the assemblies sym-linked into a Bactopia directory structure. It's time to let Bactopia Tools shine! To do this, I will run the legsta Bactopia Tool and demonstrate how seamless it is to type 5,393 assemblies.
With a simple addition of --wf legsta
and pointing to the Bactopia directory, legsta
will
be executed on all 5,393 assemblies! It really is that simple!
Please use Docker or Singularity for these analyses
I'm a big supporter of Conda, but for reproducibility, it is recommended to use Docker or Singularity with Bactopia Tools. Conda environments can change depending on when they are installed, however the containers will always be the same.
After some time, the legsta
tool will complete for all 5,393 assemblies, and you should be
met with something like the following:
[5d/d04297] process > BACTOPIATOOLS:LEGSTA:LEGSTA_MODULE (SAMN29911258) [100%] 5393 of 5393 ✔
[71/c63bf7] process > BACTOPIATOOLS:LEGSTA:CSVTK_CONCAT (legsta) [100%] 1 of 1 ✔
[16/833262] process > BACTOPIATOOLS:CUSTOM_DUMPSOFTWAREVERSIONS (1) [100%] 1 of 1 ✔
Bactopia Tools: `legsta Execution Summary
---------------------------
Bactopia Version : 3.0.1
Nextflow Version : 23.10.1
Command Line : nextflow run /home/rpetit3/bactopia/main.nf --wf legsta \
--bactopia bactopia/ -profile singularity
Resumed : false
Completed At : 2024-03-22T15:09:54.959834620-06:00
Duration : 32m 51s
Success : true
Exit Code : 0
Error Report : -
Launch Dir : /home/rpetit3/test-legsta
This took about 30 minutes on my laptop, but it was incredibly simple to run legsta
on
all 5,393 L. pneumophila assemblies.
Results of Typing¶
Here's the fun part, typing results for all 5,393 L. pneumophila assemblies! Another nice thing about the Bactopia Tools in that in most cases it will merge all the results at the end leaving you with just a single file to review.
To share the results of this analysis, I've uploaded the results to Google Drive and made them available from this link: Bactopia - Legsta Results from AllTheBacteria
From this Google Sheet, you can make a copy or export the results as a CSV file.
Conclusion¶
In this post, I demonstrated how in just a few steps you can make use of Bactopia Tools to rapidly and seamlessly start analyzing the 2,000,000 bacterial assemblies from AllTheBacteria. If you are planning to do your own downstream analyses on these assemblies, I hope this post has convinced you that Bactopia can make this process much easier.
If you have any questions or ideas for additional Bactopia Tools, please feel free to reach out to me!
🎉 Also! This the first ever blog post for Bactopia! 🎉