Sketcher
The sketcher
module uses Mash and
Sourmash to create sketches and query
RefSeq and GTDB.
Output Overview¶
Below is the default output structure for the sketcher
step in Bactopia. Where
possible the file descriptions below were modified from a tools description.
<BACTOPIA_DIR>
├── <SAMPLE_NAME>
│ └── main
│ └── sketcher
│ ├── logs
│ │ ├── nf-sketcher.{begin,err,log,out,run,sh,trace}
│ │ └── versions.yml
│ ├── <SAMPLE_NAME>-k{21|31}.msh
│ ├── <SAMPLE_NAME>-mash-refseq88-k21.txt
│ ├── <SAMPLE_NAME>-sourmash-gtdb-rs207-k31.txt
│ └── <SAMPLE_NAME>.sig
└── bactopia-runs
└── bactopia-<TIMESTAMP>
└── nf-reports
├── bactopia-dag.dot
├── bactopia-report.html
├── bactopia-timeline.html
└── bactopia-trace.txt
Results¶
sketcher¶
Below is a description of the per-sample results from the sketcher
subworkflow.
Filename | Description |
---|---|
<SAMPLE_NAME>-k{21|31}.msh | A Mash sketch of the input assembly for k=21 and k=31 |
<SAMPLE_NAME>-mash-refseq88-k21.txt | The results of querying the Mash sketch against RefSeq88 |
<SAMPLE_NAME>-sourmash-gtdb-rs207-k31.txt | The results of querying the Sourmash sketch against GTDB-rs207 |
<SAMPLE_NAME>.sig | A Sourmash sketch of the input assembly for k=21, k=31, and k=51 |
Audit Trail¶
Below are files that can assist you in understanding which parameters and program versions were used.
Logs¶
Each process that is executed will have a folder named logs
. In this folder are helpful
files for you to review if the need ever arises.
Extension | Description |
---|---|
.begin | An empty file used to designate the process started |
.err | Contains STDERR outputs from the process |
.log | Contains both STDERR and STDOUT outputs from the process |
.out | Contains STDOUT outputs from the process |
.run | The script Nextflow uses to stage/unstage files and queue processes based on given profile |
.sh | The script executed by bash for the process |
.trace | The Nextflow Trace report for the process |
versions.yml | A YAML formatted file with program versions |
Parameters¶
Sketcher¶
Parameter | Description |
---|---|
--sketch_size |
Sketch size. Each sketch will have at most this many non-redundant min-hashes. Type: integer , Default: 10000 |
--sourmash_scale |
Choose number of hashes as 1 in FRACTION of input k-mers Type: integer , Default: 10000 |
--no_winner_take_all |
Disable winner-takes-all strategy for identity estimates Type: boolean |
--screen_i |
Minimum identity to report. Type: number , Default: 0.8 |
Citations¶
If you use Bactopia and sketcher
in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
Genome Taxonomy Database
Parks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil P-A, Hugenholtz P GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy Nucleic Acids Research gkab776 (2021) -
Mash
Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17, 132 (2016) -
Mash
Ondov BD, Starrett GJ, Sappington A, Kostic A, Koren S, Buck CB, Phillippy AM Mash Screen: high-throughput sequence containment estimation for genome discovery Genome Biol 20, 232 (2019) -
NCBI RefSeq Database
O'Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, Smith-White B, Ako-Adjei D, Astashyn A, Badretdin A, Bao Y, Blinkova O0, Brover V, Chetvernin V, Choi J, Cox E, Ermolaeva O, Farrell CM, Goldfarb T, Gupta T, Haft D, Hatcher E, Hlavina W, Joardar VS, Kodali VK, Li W, Maglott D, Masterson P, McGarvey KM, Murphy MR, O'Neill K, Pujar S, Rangwala SH, Rausch D, Riddick LD, Schoch C, Shkeda A, Storz SS, Sun H, Thibaud-Nissen F, Tolstoy I, Tully RE, Vatsan AR, Wallin C, Webb D, Wu W, Landrum MJ, Kimchi A, Tatusova T, DiCuccio M, Kitts P, Murphy TD, Pruitt KD Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–45 (2016) -
Sourmash
Brown CT, Irber L sourmash: a library for MinHash sketching of DNA. JOSS 1, 27 (2016)