Skip to content

Enhancements to Open Source Software

Maintaining open source software is a difficult challenge. It's often a time-consuming and completely voluntary process with little to no recognition. The field of bioinformatics is not immune to this. Many of the tools we use on a daily basis are maintained by individuals with little to no support. Bactopia, is no different.

Being fully aware of these challenges, when I first started developing Bactopia, I wanted there to be mechanisms to contribute back to the community. To acheive this, I implemented a few design requirements:

  1. Tools must open source and free to use.
  2. Tools must be available from conda
  3. Bactopia Tools must be available on nf-core/modules

Bactopia has provided 156+ contributions to the bioinformatics community

  • 10 stand-alone tools, each available from Bioconda
  • 29 new Conda recipes, 35 updated recipes, and 1,750+ Bioconda pull requests reviewed.
  • 62 contributions to nf-core/modules
  • 20 contributions to other tools

These contributions are to the wider community, and do not require you to use Bactopia to take advantage of them.

Stand-Alone Tools

Occasionally tools are developed for specific tasks in Bactopia. For example, Dragonflye was developed to add Nanopore support to Bactopia. When these tools were developed, they are developed to be stand-alone. Below are 10 tools, originally built for Bactopia that you can make use of outside of Bactopia.

Tool Description
assembly-scan Generate basic stats for an assembly
dragonflye Assemble bacterial isolate genomes from Nanopore reads
fastq-dl Download FASTQ files from SRA or ENA repositories.
fastq-scan Output FASTQ summary statistics in JSON format
pasty A tool for in silico serogrouping of Pseudomonas aeruginosa isolates
pbptyper In silico Penicillin Binding Protein typer for Streptococcus pneumoniae
pmga A fork of PMGA for all Neisseria species and Haemophilus influenzae
shovill-se A fork of Shovill that includes support for single end reads
staphopia-sccmec A standalone version of Staphopia’s SCCmec typing method
vcf-annotator Add biological annotations to variants in a given VCF file

Bioconda Contributions

Bactopia requires tools be installable with Conda to make installation easier for users. An unintended side-effect of this has been a larger involvement with the Bioconda community. Bioconda is an amazing resource that doesn't end with `conda install! For every recipe added to Bioconda, a Docker container is created by Biocontainers, as well as a Singularity image is created by the Galaxy Project. At the end of the day, a single recipe makes a huge contribution to the community.

Bactopia has led to 29 new recipes, 35 updated recipes, and more than 1,000 pull requests have been reviewed.

New Recipes

Bactopia has led to the addition of 29 new recipes to Bioconda and conda-forge. These new recipes allow users to rapidly begin using these tools for their own analyses, and include:

Tool Description Pull Request
Aspera Connect high-performance transfer client anaconda/rpetit3
assembly-scan Generate basic stats for an assembly bioconda/bioconda-recipes#11425
bactopia A flexible pipeline for complete analysis of bacterial genomes bioconda/bioconda-recipes#17434
Dragonflye Assemble bacterial isolate genomes from Nanopore reads bioconda/bioconda-recipes#29696
ena-dl Download FASTQ files from ENA bioconda/bioconda-recipes#17354
EToKi all methods related to Enterobase bioconda/bioconda-recipes#37069
executor programmer friendly Python subprocess wrapper conda-forge/staged-recipes#9457
fastq-dl Download FASTQ files from SRA or ENA repositories. bioconda/bioconda-recipes#18252
fastq-scan Output FASTQ summary statistics in JSON format bioconda/bioconda-recipes#11415
GenoTyphi assign genotypes to Salmonella Typhi genomes bioconda/bioconda-recipes#25674
illumina-cleanup A simple pipeline for pre-processing Illumina FASTQ files bioconda/bioconda-recipes#11481
ISMapper insertion sequence mapping software bioconda/bioconda-recipes#14180
mashpit Sketch-based surveillance platform bioconda/bioconda-recipes#35199
NextPolish Fast and accurately polish the genome generated by long reads bioconda/bioconda-recipes#36582
ParallelTask A simple and lightweight parallel task engine conda-forge/staged-recipes#19616
ParallelTask A simple and lightweight parallel task engine conda-forge/staged-recipes#19616
pasty A tool for in silico serogrouping of Pseudomonas aeruginosa isolates bioconda/bioconda-recipes#35930
pbptyper In silico Penicillin Binding Protein typer for Streptococcus pneumoniae bioconda/bioconda-recipes#36222
pHierCC Hierarchical clustering of cgMLST bioconda/bioconda-recipes#37070
pmga Command-line version of PMGA (PubMLST Genome Annotator) bioconda/bioconda-recipes/#32801
property-manager useful property variants for Python programming conda-forge/staged-recipes#9442
RFPlasmid predicting plasmid contigs from assemblies bioconda/bioconda-recipes#25849
SerotypeFinder Identifies the serotype in total or partial sequenced isolates of E. coli bioconda/bioconda-recipes#29718
shovill-se A fork of Shovill that includes support for single end reads bioconda/bioconda-recipes#26040
spaTyper computational method for finding spa types bioconda/bioconda-recipes#26044
sra-human-scrubber Identify and remove human reads from FASTQ files bioconda/bioconda-recipes#29926
staphopia-sccmec A standalone version of Staphopia's SCCmec typing method bioconda/bioconda-recipes#28214
tbl2asn-forever use tbl2asn forever by pretending that it's still 2019 bioconda/bioconda-recipes#20073
vcf-annotator Add biological annotations to variants in a given VCF file bioconda/bioconda-recipes#13417

Every recipe gets a Docker and Singularity container

Sometimes overlooked, its important to reinterate, every recipe added to Bioconda has a Docker container created by Biocontainers, and a Singularity container created by the Galaxy Project. These containers allow for version controlled reproducible analyses.

Enhancements and Fixes

A common issue with Bioconda recipes, is the tool works great in a Conda environment when containerized it fails for various reasons. When these issues occur with a tool used by Bactopia an effort is made to improve or fix the Bioconda recipe. Below is a list fixes and improvements to some Bioconda recipes:

Tool Description Pull Request
pggb Update pinnings in pggb bioconda/bioconda-recipes#35734
Nullarbor Rebuild nullarbor container bioconda/bioconda-recipes#35687
GenoTyphi Update genotyphi recipe for mykrobe based analysis bioconda/bioconda-recipes#35388
Seroba Add database to Seroba recipe bioconda/bioconda-recipes#35378
Ariba Update ariba dependencies for latest pymummer bioconda/bioconda-recipes#35383
pymummer patch pymummer recipe to use system/user TMP bioconda/bioconda-recipes#35379
PlasmidFinder Update PlasmidFinder for better container support bioconda/bioconda-recipes#35314
GTDB-Tk Allow GTDB-Tk database download with container bioconda/bioconda-recipes#35174
ShigaTyper update shigatyper recipe for better container support bioconda/bioconda-recipes#35161
FastANI Remove fastani from build fail list bioconda/bioconda-recipes#33556
FastANI update FastANI recipe bioconda/bioconda-recipes#33433
Prokka Update Prokka bioperl pinning bioconda/bioconda-recipes#33411
SsuisSero update SsuisSero dependency bioconda/bioconda-recipes#33268
RGI Improve RGI docker container bioconda/bioconda-recipes#33249
legsta Improve dockerbuild for Legsta bioconda/bioconda-recipes#33246
fastq-scan Update fastq-scan recipe to include jq bioconda/bioconda-recipes#32650
Ariba Patch ariba recipe with minor bug fixes bioconda/bioconda-recipes#32258
PIRATE Update PIRATE recipe to include post-analysis scripts bioconda/bioconda-recipes#31629
ngmaster rebuild ngmaster to get docker container bioconda/bioconda-recipes#31376
AgrVATE add missing dependency for agrvate bioconda/bioconda-recipes#31035
spaTyper Patch spatyper for entrypoint support bioconda/bioconda-recipes#30824
spaTyper Patch spatyper for better container support bioconda/bioconda-recipes#30622
Kleborate Update kleborate recipe to build DB bioconda/bioconda-recipes#30582
cyvcf2 Loosen htslib version requirement for cyvcf2 bioconda/bioconda-recipes#30044
Kleborate Patch Kleborate's method for discovering Kaptive bioconda/bioconda-recipes#29623
spaTyper update spatyper - drop blake_sha256 requirement bioconda/bioconda-recipes#27321
ISMapper ISMapper - Fix BioPython pinning bioconda/bioconda-recipes#26599
CheckM checkm-genome - fix broken pinning by older pysam version bioconda/bioconda-recipes#25856
ISMapper Update ISMapper - Pin BioPython version bioconda/bioconda-recipes#24314
Ariba Patches for third party links used by Ariba bioconda/bioconda-recipes#24010
Seroba Add pysam pinning for Seroba bioconda/bioconda-recipes#17568
Ariba Update pysam pinning for Ariba bioconda/bioconda-recipes#17448
tbl2asn Previous version of tbl2asn has expired, updated to 25.7 bioconda/bioconda-recipes#16131
ISMapper Rebuild ismapper for GCC7 migration bioconda/bioconda-recipes#14276
MentaLiST MentaLiST v0.2.4 patch for Julia bioconda/bioconda-recipes#13137

nf-core/modules Contributions

When Bactopia was converted to Nextflow DSL2, it allowed the opportunity to adopt modules from nf-core/modules. These modules allow users to easily include them in their own Nextflow DSL2 pipelines. To facilitate this, A requirement was made that each Bactopia Tool, separate workflows, would require that a corresponding module be available from nf-core/modules, and if its not available it would be added.

By adopting this practice, there have been 62 contributions to nf-core/modules in the form of new modules, module updates, and testing adjustments.

Tool Description Pull Request
ShigEiFinder add shigeifinder module nf-core/modules#2523
nf-core/modules fix a few tests after restructure nf-core/modules#2234
Biohansel add biohansel module nf-core/modules#2234
pbptyper add pbptyper module nf-core/modules#2005
pasty add module for pasty nf-core/modules#2003
snippy-core add snippy/core module nf-core/modules#1855
Mykrobe add module for mykrobe/predict nf-core/modules#1818
GenoTyphi add module for genotyphi/parse nf-core/modules#1818
Seroba add module for seroba nf-core/modules#1816
PlasmidFinder add plasmidfinder module nf-core/modules#1773
mcroni add mcroni module nf-core/modules#1750
Ariba add ariba module nf-core/modules#1731
snippy add snippy module nf-core/modules#1643
ShigaTyper add shigatyper module nf-core/modules#1548
panaroo add module for panaroo, fix pirate tests nf-core/modules#1444
Dragonflye Update dragonflye to latest version nf-core/modules#1442
Bakta update bakta to latest version (v1.4.0) nf-core/modules#1428
Roary Update test.yml for Roary module nf-core/modules#1419
HpsuisSero add hpsuisero module nf-core/modules#1331
SsuisSero add ssuisero module nf-core/modules#1329
SISTR add sistr module nf-core/modules#1322
RGI add rgi module nf-core/modules#1321
legsta add legsta module nf-core/modules#1319
AMRFinder+ add amrfindplus module nf-core/modules#1284
abricate add abricate module nf-core/modules#1280
mobsuite/recon add mobsuite/recon module nf-core/modules#1270
mash/dist add mash/dist module nf-core/modules#1193
Kleborate Fix kleborate inputs nf-core/modules#1172
nf-core/modules fix test data path for ClonalFrameML,roary,pirate nf-core/modules#1085
Bakta add bakta module nf-core/modules#1085
nf-core/modules use underscores in anchors and references nf-core/modules#1080
Scoary add scoary module nf-core/modules#1034
emmtyper add emmtyper module nf-core/modules#1028
LisSero add lissero module nf-core/modules#1026
ngmaster add ngmaster module nf-core/modules#1024
meningotype add meningotype module nf-core/modules#1022
SeqSero2 add seqsero2 module nf-core/modules#1016
ncbi-genome-download add ncbi-genome-download module nf-core/modules#980
ClonalFrameML add clonalframeml module nf-core/modules#974
AgrVATE Update agrvate version nf-core/modules#970
ECTyper add ectyper module nf-core/modules#948
TBProfiler add tbprofiler module nf-core/modules#947
spaTyper Update spatyper module (cleanup debug) nf-core/modules#938
hicap [fix] hicap module allow optional outputs nf-core/modules#937
fastq-scan add fastq-scan module nf-core/modules#935
csvtk patch output extension in csvtk/concat nf-core/modules#797
csvtk add csvtk/concat module nf-core/modules#785
spaTyper add spatyper module nf-core/modules#784
PIRATE add pirate module nf-core/modules#777
Roary add roary module nf-core/modules#776
ISMapper add ismapper module nf-core/modules#773
hicap add hicap module nf-core/modules#772
mashtree add mashtree module nf-core/modules#767
nf-core/modules update tests for 12 modules for new config nf-core/modules#758
AgrVATE Update agrvate to v1.0.1 nf-core/modules#728
staphopia-sccmec add staphopia-sccmec module nf-core/modules#702
Dragonflye add module for dragonflye nf-core/modules#633
nf-core/modules update tests for 21 modules for new config nf-core/modules#384
Prokka Update Prokka modules - add process label nf-core/modules#350
nf-core/modules README - Fix link describing process labels nf-core/modules#349
Shovill Update shovill module nf-core/modules#337
Prokka add prokka module nf-core/modules#298

Other Contributions

In addition to Bioconda and nf-core/modules, Bactopia has made 20 contributions to other tools including:

Tool Description Pull Request
EToKi let tempfile determine where to put temp files lskatz/EToKi#2
EToKi Allow multiple path parameters on the configure step lskatz/EToKi#1
Seroba let tempfile determine temp dir location sanger-pathogens/seroba#68
pymummer allow the user to specify temp dir or use the system default sanger-pathogens/pymummer#36
ShigaTyper Fix install process CFSAN-Biostatistics/shigatyper#10
legsta use grep -q to play nice with bioconda docker build tseemann/legsta#17
ShigaTyper Add single-end and ONT support, add GitHub Actions, update readme CFSAN-Biostatistics/shigatyper#9
Ariba Ignore comments column and drop Bio.Alphabet sanger-pathogens/ariba#319
BioContainers Add ClonalFrameML and maskrc-svg multipackage BioContainers/multi-package-containers#1923"
Kleborate Add --kaptive_path to specify path to kaptive data katholt/Kleborate#59
Ariba fix SPAdes version capture sanger-pathogens/ariba#315
AgrVATE Fix for dots in sample names VishnuRaghuram94/AgrVATE#9
PIRATE Add minimum feature length option SionBayliss/PIRATE#53
Ariba Fix for changes in PubMLST url sanger-pathogens/ariba#305
Ariba Solution 1: for fixing CARD download sanger-pathogens/ariba#302
bowtie2 Rename VERSION to BOWTIE2_VERSION BenLangmead/bowtie2#302
phyloFlash Improved single end support HRGV/phyloFlash#102
ISMapper set min_range and max_range args to be a float jhawkey/IS_mapper#38
maskrc-svg Add requirements.txt for python modules kwongj/maskrc-svg#2
Shovill Added shovill-se for processing single-end reads tseemann/shovill#105