Enhancements to Open Source Software¶
Maintaining open source software is a difficult challenge. It's often a time-consuming and completely voluntary process with little to no recognition. The field of bioinformatics is not immune to this. Many of the tools we use on a daily basis are maintained by individuals with little to no support. Bactopia, is no different.
Being fully aware of these challenges, when I first started developing Bactopia, I wanted there to be mechanisms to contribute back to the community. To acheive this, I implemented a few design requirements:
- Tools must open source and free to use.
- Tools must be available from conda
- Bactopia Tools must be available on nf-core/modules
Bactopia has provided 156+ contributions to the bioinformatics community
- 10 stand-alone tools, each available from Bioconda
- 29 new Conda recipes, 35 updated recipes, and 1,750+ Bioconda pull requests reviewed.
- 62 contributions to nf-core/modules
- 20 contributions to other tools
These contributions are to the wider community, and do not require you to use Bactopia to take advantage of them.
Stand-Alone Tools¶
Occasionally tools are developed for specific tasks in Bactopia. For example, Dragonflye was developed to add Nanopore support to Bactopia. When these tools were developed, they are developed to be stand-alone. Below are 10 tools, originally built for Bactopia that you can make use of outside of Bactopia.
Tool | Description |
---|---|
assembly-scan | Generate basic stats for an assembly |
dragonflye | Assemble bacterial isolate genomes from Nanopore reads |
fastq-dl | Download FASTQ files from SRA or ENA repositories. |
fastq-scan | Output FASTQ summary statistics in JSON format |
pasty | A tool for in silico serogrouping of Pseudomonas aeruginosa isolates |
pbptyper | In silico Penicillin Binding Protein typer for Streptococcus pneumoniae |
pmga | A fork of PMGA for all Neisseria species and Haemophilus influenzae |
shovill-se | A fork of Shovill that includes support for single end reads |
staphopia-sccmec | A standalone version of Staphopia’s SCCmec typing method |
vcf-annotator | Add biological annotations to variants in a given VCF file |
Bioconda Contributions¶
Bactopia requires tools be installable with Conda to make installation easier for users. An unintended side-effect of this has been a larger involvement with the Bioconda community. Bioconda is an amazing resource that doesn't end with `conda install! For every recipe added to Bioconda, a Docker container is created by Biocontainers, as well as a Singularity image is created by the Galaxy Project. At the end of the day, a single recipe makes a huge contribution to the community.
Bactopia has led to 29 new recipes, 35 updated recipes, and more than 1,000 pull requests have been reviewed.
New Recipes¶
Bactopia has led to the addition of 29 new recipes to Bioconda and conda-forge. These new recipes allow users to rapidly begin using these tools for their own analyses, and include:
Tool | Description | Pull Request |
---|---|---|
Aspera Connect | high-performance transfer client | anaconda/rpetit3 |
assembly-scan | Generate basic stats for an assembly | bioconda/bioconda-recipes#11425 |
bactopia | A flexible pipeline for complete analysis of bacterial genomes | bioconda/bioconda-recipes#17434 |
Dragonflye | Assemble bacterial isolate genomes from Nanopore reads | bioconda/bioconda-recipes#29696 |
ena-dl | Download FASTQ files from ENA | bioconda/bioconda-recipes#17354 |
EToKi | all methods related to Enterobase | bioconda/bioconda-recipes#37069 |
executor | programmer friendly Python subprocess wrapper | conda-forge/staged-recipes#9457 |
fastq-dl | Download FASTQ files from SRA or ENA repositories. | bioconda/bioconda-recipes#18252 |
fastq-scan | Output FASTQ summary statistics in JSON format | bioconda/bioconda-recipes#11415 |
GenoTyphi | assign genotypes to Salmonella Typhi genomes | bioconda/bioconda-recipes#25674 |
illumina-cleanup | A simple pipeline for pre-processing Illumina FASTQ files | bioconda/bioconda-recipes#11481 |
ISMapper | insertion sequence mapping software | bioconda/bioconda-recipes#14180 |
mashpit | Sketch-based surveillance platform | bioconda/bioconda-recipes#35199 |
NextPolish | Fast and accurately polish the genome generated by long reads | bioconda/bioconda-recipes#36582 |
ParallelTask | A simple and lightweight parallel task engine | conda-forge/staged-recipes#19616 |
ParallelTask | A simple and lightweight parallel task engine | conda-forge/staged-recipes#19616 |
pasty | A tool for in silico serogrouping of Pseudomonas aeruginosa isolates | bioconda/bioconda-recipes#35930 |
pbptyper | In silico Penicillin Binding Protein typer for Streptococcus pneumoniae | bioconda/bioconda-recipes#36222 |
pHierCC | Hierarchical clustering of cgMLST | bioconda/bioconda-recipes#37070 |
pmga | Command-line version of PMGA (PubMLST Genome Annotator) | bioconda/bioconda-recipes/#32801 |
property-manager | useful property variants for Python programming | conda-forge/staged-recipes#9442 |
RFPlasmid | predicting plasmid contigs from assemblies | bioconda/bioconda-recipes#25849 |
SerotypeFinder | Identifies the serotype in total or partial sequenced isolates of E. coli | bioconda/bioconda-recipes#29718 |
shovill-se | A fork of Shovill that includes support for single end reads | bioconda/bioconda-recipes#26040 |
spaTyper | computational method for finding spa types | bioconda/bioconda-recipes#26044 |
sra-human-scrubber | Identify and remove human reads from FASTQ files | bioconda/bioconda-recipes#29926 |
staphopia-sccmec | A standalone version of Staphopia's SCCmec typing method | bioconda/bioconda-recipes#28214 |
tbl2asn-forever | use tbl2asn forever by pretending that it's still 2019 | bioconda/bioconda-recipes#20073 |
vcf-annotator | Add biological annotations to variants in a given VCF file | bioconda/bioconda-recipes#13417 |
Every recipe gets a Docker and Singularity container
Sometimes overlooked, its important to reinterate, every recipe added to Bioconda has a Docker container created by Biocontainers, and a Singularity container created by the Galaxy Project. These containers allow for version controlled reproducible analyses.
Enhancements and Fixes¶
A common issue with Bioconda recipes, is the tool works great in a Conda environment when containerized it fails for various reasons. When these issues occur with a tool used by Bactopia an effort is made to improve or fix the Bioconda recipe. Below is a list fixes and improvements to some Bioconda recipes:
Tool | Description | Pull Request |
---|---|---|
pggb | Update pinnings in pggb | bioconda/bioconda-recipes#35734 |
Nullarbor | Rebuild nullarbor container | bioconda/bioconda-recipes#35687 |
GenoTyphi | Update genotyphi recipe for mykrobe based analysis | bioconda/bioconda-recipes#35388 |
Seroba | Add database to Seroba recipe | bioconda/bioconda-recipes#35378 |
Ariba | Update ariba dependencies for latest pymummer | bioconda/bioconda-recipes#35383 |
pymummer | patch pymummer recipe to use system/user TMP | bioconda/bioconda-recipes#35379 |
PlasmidFinder | Update PlasmidFinder for better container support | bioconda/bioconda-recipes#35314 |
GTDB-Tk | Allow GTDB-Tk database download with container | bioconda/bioconda-recipes#35174 |
ShigaTyper | update shigatyper recipe for better container support | bioconda/bioconda-recipes#35161 |
FastANI | Remove fastani from build fail list | bioconda/bioconda-recipes#33556 |
FastANI | update FastANI recipe | bioconda/bioconda-recipes#33433 |
Prokka | Update Prokka bioperl pinning | bioconda/bioconda-recipes#33411 |
SsuisSero | update SsuisSero dependency | bioconda/bioconda-recipes#33268 |
RGI | Improve RGI docker container | bioconda/bioconda-recipes#33249 |
legsta | Improve dockerbuild for Legsta | bioconda/bioconda-recipes#33246 |
fastq-scan | Update fastq-scan recipe to include jq | bioconda/bioconda-recipes#32650 |
Ariba | Patch ariba recipe with minor bug fixes | bioconda/bioconda-recipes#32258 |
PIRATE | Update PIRATE recipe to include post-analysis scripts | bioconda/bioconda-recipes#31629 |
ngmaster | rebuild ngmaster to get docker container | bioconda/bioconda-recipes#31376 |
AgrVATE | add missing dependency for agrvate | bioconda/bioconda-recipes#31035 |
spaTyper | Patch spatyper for entrypoint support | bioconda/bioconda-recipes#30824 |
spaTyper | Patch spatyper for better container support | bioconda/bioconda-recipes#30622 |
Kleborate | Update kleborate recipe to build DB | bioconda/bioconda-recipes#30582 |
cyvcf2 | Loosen htslib version requirement for cyvcf2 | bioconda/bioconda-recipes#30044 |
Kleborate | Patch Kleborate's method for discovering Kaptive | bioconda/bioconda-recipes#29623 |
spaTyper | update spatyper - drop blake_sha256 requirement | bioconda/bioconda-recipes#27321 |
ISMapper | ISMapper - Fix BioPython pinning | bioconda/bioconda-recipes#26599 |
CheckM | checkm-genome - fix broken pinning by older pysam version | bioconda/bioconda-recipes#25856 |
ISMapper | Update ISMapper - Pin BioPython version | bioconda/bioconda-recipes#24314 |
Ariba | Patches for third party links used by Ariba | bioconda/bioconda-recipes#24010 |
Seroba | Add pysam pinning for Seroba | bioconda/bioconda-recipes#17568 |
Ariba | Update pysam pinning for Ariba | bioconda/bioconda-recipes#17448 |
tbl2asn | Previous version of tbl2asn has expired, updated to 25.7 | bioconda/bioconda-recipes#16131 |
ISMapper | Rebuild ismapper for GCC7 migration | bioconda/bioconda-recipes#14276 |
MentaLiST | MentaLiST v0.2.4 patch for Julia | bioconda/bioconda-recipes#13137 |
nf-core/modules Contributions¶
When Bactopia was converted to Nextflow DSL2, it allowed the opportunity to adopt modules from nf-core/modules. These modules allow users to easily include them in their own Nextflow DSL2 pipelines. To facilitate this, A requirement was made that each Bactopia Tool, separate workflows, would require that a corresponding module be available from nf-core/modules, and if its not available it would be added.
By adopting this practice, there have been 62 contributions to nf-core/modules in the form of new modules, module updates, and testing adjustments.
Other Contributions¶
In addition to Bioconda and nf-core/modules, Bactopia has made 20 contributions to other tools including:
Tool | Description | Pull Request |
---|---|---|
EToKi | let tempfile determine where to put temp files | lskatz/EToKi#2 |
EToKi | Allow multiple path parameters on the configure step | lskatz/EToKi#1 |
Seroba | let tempfile determine temp dir location | sanger-pathogens/seroba#68 |
pymummer | allow the user to specify temp dir or use the system default | sanger-pathogens/pymummer#36 |
ShigaTyper | Fix install process | CFSAN-Biostatistics/shigatyper#10 |
legsta | use grep -q to play nice with bioconda docker build | tseemann/legsta#17 |
ShigaTyper | Add single-end and ONT support, add GitHub Actions, update readme | CFSAN-Biostatistics/shigatyper#9 |
Ariba | Ignore comments column and drop Bio.Alphabet | sanger-pathogens/ariba#319 |
BioContainers | Add ClonalFrameML and maskrc-svg multipackage | BioContainers/multi-package-containers#1923" |
Kleborate | Add --kaptive_path to specify path to kaptive data | katholt/Kleborate#59 |
Ariba | fix SPAdes version capture | sanger-pathogens/ariba#315 |
AgrVATE | Fix for dots in sample names | VishnuRaghuram94/AgrVATE#9 |
PIRATE | Add minimum feature length option | SionBayliss/PIRATE#53 |
Ariba | Fix for changes in PubMLST url | sanger-pathogens/ariba#305 |
Ariba | Solution 1: for fixing CARD download | sanger-pathogens/ariba#302 |
bowtie2 | Rename VERSION to BOWTIE2_VERSION | BenLangmead/bowtie2#302 |
phyloFlash | Improved single end support | HRGV/phyloFlash#102 |
ISMapper | set min_range and max_range args to be a float | jhawkey/IS_mapper#38 |
maskrc-svg | Add requirements.txt for python modules | kwongj/maskrc-svg#2 |
Shovill | Added shovill-se for processing single-end reads | tseemann/shovill#105 |