Skip to content

Sketcher

The sketcher module uses Mash and Sourmash to create sketches and query RefSeq and GTDB.

Output Overview

Below is the default output structure for the sketcher step in Bactopia. Where possible the file descriptions below were modified from a tools description.

<BACTOPIA_DIR>
├── <SAMPLE_NAME>
│   └── main
│       └── sketcher
│           ├── logs
│           │   ├── nf-sketcher.{begin,err,log,out,run,sh,trace}
│           │   └── versions.yml
│           ├── <SAMPLE_NAME>-k{21|31}.msh
│           ├── <SAMPLE_NAME>-mash-refseq88-k21.txt
│           ├── <SAMPLE_NAME>-sourmash-gtdb-rs207-k31.txt
│           └── <SAMPLE_NAME>.sig
└── bactopia-runs
    └── bactopia-<TIMESTAMP>
        └── nf-reports
            ├── bactopia-dag.dot
            ├── bactopia-report.html
            ├── bactopia-timeline.html
            └── bactopia-trace.txt

Results

sketcher

Below is a description of the per-sample results from the sketcher subworkflow.

Filename Description
<SAMPLE_NAME>-k{21|31}.msh A Mash sketch of the input assembly for k=21 and k=31
<SAMPLE_NAME>-mash-refseq88-k21.txt The results of querying the Mash sketch against RefSeq88
<SAMPLE_NAME>-sourmash-gtdb-rs207-k31.txt The results of querying the Sourmash sketch against GTDB-rs207
<SAMPLE_NAME>.sig A Sourmash sketch of the input assembly for k=21, k=31, and k=51

Audit Trail

Below are files that can assist you in understanding which parameters and program versions were used.

Logs

Each process that is executed will have a folder named logs. In this folder are helpful files for you to review if the need ever arises.

Extension Description
.begin An empty file used to designate the process started
.err Contains STDERR outputs from the process
.log Contains both STDERR and STDOUT outputs from the process
.out Contains STDOUT outputs from the process
.run The script Nextflow uses to stage/unstage files and queue processes based on given profile
.sh The script executed by bash for the process
.trace The Nextflow Trace report for the process
versions.yml A YAML formatted file with program versions

Parameters

Sketcher

Parameter Description
--sketch_size Sketch size. Each sketch will have at most this many non-redundant min-hashes.
Type: integer, Default: 10000
--sourmash_scale Choose number of hashes as 1 in FRACTION of input k-mers
Type: integer, Default: 10000
--no_winner_take_all Disable winner-takes-all strategy for identity estimates
Type: boolean
--screen_i Minimum identity to report.
Type: number, Default: 0.8

Citations

If you use Bactopia and sketcher in your analysis, please cite the following.