Enhancements to Open-Source Software¶
Sustaining open source software is a difficult challenge that often demands substantial time and effort, usually without the benefit of recognition or support. The field of bioinformatics is no exception, as it heavily depends on tools maintained by individuals with little to no support. Bactopia is no different.
Recognizing these challenges, I designed Bactopia with an explicit goal of giving back to the community. To fulfill this aim, I incorporated several key design requirements:
- Tools must open source and free to use.
- Tools must be available from conda
- Bactopia Tools must be available on nf-core/modules
Bactopia has provided 176+ contributions to the bioinformatics community
- 11 stand-alone tools, each available from Bioconda
- 30 new Conda recipes, 41 updated recipes, and 2,000+ Bioconda pull requests reviewed.
- 68 contributions to nf-core/modules
- 26 contributions to other tools
These contributions are to the wider community, and do not require you to use Bactopia to take advantage of them.
Stand-Alone Tools¶
Sometimes, tools are developed to enhance Bactopia capabilities, such as Dragonflye, which was developed to add Nanopore support. These tools are designed to function as stand-alone tools. Below are 11 such tools, originally built for Bactopia, that you can also use independent of Bactopia.
Tool | Description |
---|---|
assembly-scan | Generate basic stats for an assembly |
dragonflye | Assemble bacterial isolate genomes from Nanopore reads |
fastq-dl | Download FASTQ files from SRA or ENA repositories. |
fastq-scan | Output FASTQ summary statistics in JSON format |
GOBLIN | Generate trusted prOteins to supplement BacteriaL annotatIoN |
pasty | A tool for in silico serogrouping of Pseudomonas aeruginosa isolates |
pbptyper | In silico Penicillin Binding Protein typer for Streptococcus pneumoniae |
pmga | A fork of PMGA for all Neisseria species and Haemophilus influenzae |
shovill-se | A fork of Shovill that includes support for single end reads |
staphopia-sccmec | A standalone version of Staphopia’s SCCmec typing method |
vcf-annotator | Add biological annotations to variants in a given VCF file |
Bioconda Contributions¶
Bactopia requires tools be installable with Conda to simplify the installation process for
users. This requirement led to an unintended, but welcomed, deeper involvement involvement
with the Bioconda community. Bioconda is more than conda install
, it is a valuable resource
that makes bioinformatics tools more accessible to the community. Every time a tool is added
to Bioconda, a Docker container is created by Biocontainers,
as well as a Singularity image is created by the Galaxy Project.
In essence, a single recipe contributes significantly to the broader community.
Bactopia has led to 30 new recipes, 41 updated recipes, and more than 2,000 pull requests have been reviewed.
New Recipes¶
Bactopia has led to the addition of 30 new recipes to Bioconda and conda-forge. These new recipes allow users to rapidly begin using these tools for their own analyses, and include:
Tool | Description | Pull Request |
---|---|---|
Aspera Connect | high-performance transfer client | anaconda/rpetit3 |
assembly-scan | Generate basic stats for an assembly | bioconda/bioconda-recipes#11425 |
bactopia | A flexible pipeline for complete analysis of bacterial genomes | bioconda/bioconda-recipes#17434 |
Dragonflye | Assemble bacterial isolate genomes from Nanopore reads | bioconda/bioconda-recipes#29696 |
ena-dl | Download FASTQ files from ENA | bioconda/bioconda-recipes#17354 |
EToKi | all methods related to Enterobase | bioconda/bioconda-recipes#37069 |
executor | programmer friendly Python subprocess wrapper | conda-forge/staged-recipes#9457 |
fastq-dl | Download FASTQ files from SRA or ENA repositories. | bioconda/bioconda-recipes#18252 |
fastq-scan | Output FASTQ summary statistics in JSON format | bioconda/bioconda-recipes#11415 |
GenoTyphi | assign genotypes to Salmonella Typhi genomes | bioconda/bioconda-recipes#25674 |
GOBLIN | Generate trusted prOteins to supplement BacteriaL annotatIoN | bioconda/bioconda-recipes#38922 |
illumina-cleanup | A simple pipeline for pre-processing Illumina FASTQ files | bioconda/bioconda-recipes#11481 |
ISMapper | insertion sequence mapping software | bioconda/bioconda-recipes#14180 |
mashpit | Sketch-based surveillance platform | bioconda/bioconda-recipes#35199 |
NextPolish | Fast and accurately polish the genome generated by long reads | bioconda/bioconda-recipes#36582 |
ParallelTask | A simple and lightweight parallel task engine | conda-forge/staged-recipes#19616 |
ParallelTask | A simple and lightweight parallel task engine | conda-forge/staged-recipes#19616 |
pasty | A tool for in silico serogrouping of Pseudomonas aeruginosa isolates | bioconda/bioconda-recipes#35930 |
pbptyper | In silico Penicillin Binding Protein typer for Streptococcus pneumoniae | bioconda/bioconda-recipes#36222 |
pHierCC | Hierarchical clustering of cgMLST | bioconda/bioconda-recipes#37070 |
pmga | Command-line version of PMGA (PubMLST Genome Annotator) | bioconda/bioconda-recipes/#32801 |
property-manager | useful property variants for Python programming | conda-forge/staged-recipes#9442 |
RFPlasmid | predicting plasmid contigs from assemblies | bioconda/bioconda-recipes#25849 |
SerotypeFinder | Identifies the serotype in total or partial sequenced isolates of E. coli | bioconda/bioconda-recipes#29718 |
shovill-se | A fork of Shovill that includes support for single end reads | bioconda/bioconda-recipes#26040 |
spaTyper | computational method for finding spa types | bioconda/bioconda-recipes#26044 |
sra-human-scrubber | Identify and remove human reads from FASTQ files | bioconda/bioconda-recipes#29926 |
staphopia-sccmec | A standalone version of Staphopia's SCCmec typing method | bioconda/bioconda-recipes#28214 |
tbl2asn-forever | use tbl2asn forever by pretending that it's still 2019 | bioconda/bioconda-recipes#20073 |
vcf-annotator | Add biological annotations to variants in a given VCF file | bioconda/bioconda-recipes#13417 |
Every recipe gets a Docker and Singularity container
Sometimes overlooked, its important to reinterate, every recipe added to Bioconda has a Docker container created by Biocontainers, and a Singularity container created by the Galaxy Project. These containers allow for version controlled reproducible analyses.
Enhancements and Fixes¶
A common issue with Bioconda recipes, is the tool works great in a Conda environment when containerized it fails for various reasons. When these issues occur with a tool used by Bactopia an effort is made to improve or fix the Bioconda recipe. Below is a list fixes and improvements to some Bioconda recipes:
Tool | Description | Pull Request |
---|---|---|
ncbi-genome-download | Patch ncbi-genome-download recipe | bioconda/bioconda-recipes#41640 |
GTDB-Tk | Update GTDB-tk recipe | bioconda/bioconda-recipes#40333 |
mlst | update midas pinnings to match docs | bioconda/bioconda-recipes#38826 |
MIDAS | update midas pinnings to match docs | bioconda/bioconda-recipes#38566 |
smoove | rebuild smoove container | bioconda/bioconda-recipes#37394 |
fasta3 | update fasta3 to latest version | bioconda/bioconda-recipes#37306 |
pggb | Update pinnings in pggb | bioconda/bioconda-recipes#35734 |
Nullarbor | Rebuild nullarbor container | bioconda/bioconda-recipes#35687 |
GenoTyphi | Update genotyphi recipe for mykrobe based analysis | bioconda/bioconda-recipes#35388 |
Seroba | Add database to Seroba recipe | bioconda/bioconda-recipes#35378 |
Ariba | Update ariba dependencies for latest pymummer | bioconda/bioconda-recipes#35383 |
pymummer | patch pymummer recipe to use system/user TMP | bioconda/bioconda-recipes#35379 |
PlasmidFinder | Update PlasmidFinder for better container support | bioconda/bioconda-recipes#35314 |
GTDB-Tk | Allow GTDB-Tk database download with container | bioconda/bioconda-recipes#35174 |
ShigaTyper | update shigatyper recipe for better container support | bioconda/bioconda-recipes#35161 |
FastANI | Remove fastani from build fail list | bioconda/bioconda-recipes#33556 |
FastANI | update FastANI recipe | bioconda/bioconda-recipes#33433 |
Prokka | Update Prokka bioperl pinning | bioconda/bioconda-recipes#33411 |
SsuisSero | update SsuisSero dependency | bioconda/bioconda-recipes#33268 |
RGI | Improve RGI docker container | bioconda/bioconda-recipes#33249 |
legsta | Improve dockerbuild for Legsta | bioconda/bioconda-recipes#33246 |
fastq-scan | Update fastq-scan recipe to include jq | bioconda/bioconda-recipes#32650 |
Ariba | Patch ariba recipe with minor bug fixes | bioconda/bioconda-recipes#32258 |
PIRATE | Update PIRATE recipe to include post-analysis scripts | bioconda/bioconda-recipes#31629 |
ngmaster | rebuild ngmaster to get docker container | bioconda/bioconda-recipes#31376 |
AgrVATE | add missing dependency for agrvate | bioconda/bioconda-recipes#31035 |
spaTyper | Patch spatyper for entrypoint support | bioconda/bioconda-recipes#30824 |
spaTyper | Patch spatyper for better container support | bioconda/bioconda-recipes#30622 |
Kleborate | Update kleborate recipe to build DB | bioconda/bioconda-recipes#30582 |
cyvcf2 | Loosen htslib version requirement for cyvcf2 | bioconda/bioconda-recipes#30044 |
Kleborate | Patch Kleborate's method for discovering Kaptive | bioconda/bioconda-recipes#29623 |
spaTyper | update spatyper - drop blake_sha256 requirement | bioconda/bioconda-recipes#27321 |
ISMapper | ISMapper - Fix BioPython pinning | bioconda/bioconda-recipes#26599 |
CheckM | checkm-genome - fix broken pinning by older pysam version | bioconda/bioconda-recipes#25856 |
ISMapper | Update ISMapper - Pin BioPython version | bioconda/bioconda-recipes#24314 |
Ariba | Patches for third party links used by Ariba | bioconda/bioconda-recipes#24010 |
Seroba | Add pysam pinning for Seroba | bioconda/bioconda-recipes#17568 |
Ariba | Update pysam pinning for Ariba | bioconda/bioconda-recipes#17448 |
tbl2asn | Previous version of tbl2asn has expired, updated to 25.7 | bioconda/bioconda-recipes#16131 |
ISMapper | Rebuild ismapper for GCC7 migration | bioconda/bioconda-recipes#14276 |
MentaLiST | MentaLiST v0.2.4 patch for Julia | bioconda/bioconda-recipes#13137 |
nf-core/modules Contributions¶
When Bactopia transitioned to Nextflow DSL2, it opened the door to adopting modules from nf-core/modules. These modules enable users to seamlessly integrate them in their own Nextflow DSL2 pipelines. To support this integration, I decided to require each Bactopia Tool, must have a corresponding module be available from nf-core/modules. If such a module is not already available, it will be added.
By adopting this practice, there have been 68 contributions to nf-core/modules in the form of new modules, module updates, and testing adjustments.
Other Contributions¶
In addition to Bioconda and nf-core/modules, Bactopia has made 26 contributions to other tools including:
Tool | Description | Pull Request |
---|---|---|
MOB-suite | fix hostrange() missing 1 required positional argument: 'database_directory' | phac-nml/mob-suite#149 |
bioconda-utils | chore: update change visibility action | bioconda/bioconda-utils#873 |
Prokka | Convert Travis CI to Github Actions | tseemann/prokka#662 |
bioconda-utils | chore: add CI to changevisibility of private containers | bioconda/bioconda-utils#835 |
bioconda-containers | Patch - small fix on merge command and quay toggle visibility | bioconda/bioconda-containers#54 |
Shigatyper | Incorporate patches from Bioconda | CFSAN-Biostatistics/shigatyper#14 |
EToKi | let tempfile determine where to put temp files | lskatz/EToKi#2 |
EToKi | Allow multiple path parameters on the configure step | lskatz/EToKi#1 |
Seroba | let tempfile determine temp dir location | sanger-pathogens/seroba#68 |
pymummer | allow the user to specify temp dir or use the system default | sanger-pathogens/pymummer#36 |
ShigaTyper | Fix install process | CFSAN-Biostatistics/shigatyper#10 |
legsta | use grep -q to play nice with bioconda docker build | tseemann/legsta#17 |
ShigaTyper | Add single-end and ONT support, add GitHub Actions, update readme | CFSAN-Biostatistics/shigatyper#9 |
Ariba | Ignore comments column and drop Bio.Alphabet | sanger-pathogens/ariba#319 |
BioContainers | Add ClonalFrameML and maskrc-svg multipackage | BioContainers/multi-package-containers#1923" |
Kleborate | Add --kaptive_path to specify path to kaptive data | katholt/Kleborate#59 |
Ariba | fix SPAdes version capture | sanger-pathogens/ariba#315 |
AgrVATE | Fix for dots in sample names | VishnuRaghuram94/AgrVATE#9 |
PIRATE | Add minimum feature length option | SionBayliss/PIRATE#53 |
Ariba | Fix for changes in PubMLST url | sanger-pathogens/ariba#305 |
Ariba | Solution 1: for fixing CARD download | sanger-pathogens/ariba#302 |
bowtie2 | Rename VERSION to BOWTIE2_VERSION | BenLangmead/bowtie2#302 |
phyloFlash | Improved single end support | HRGV/phyloFlash#102 |
ISMapper | set min_range and max_range args to be a float | jhawkey/IS_mapper#38 |
maskrc-svg | Add requirements.txt for python modules | kwongj/maskrc-svg#2 |
Shovill | Added shovill-se for processing single-end reads | tseemann/shovill#105 |