Merlin

MinmER assisted species-specific bactopia tool seLectIoN, or Merlin, uses distances based on the RefSeq sketch downloaded by bactopia datasets to automatically run species-specific tools.

Currently Merlin knows 16 spells for which cover the following:

Genus/Species	Tools
Escherichia / Shigella	ECTyper, ShigaTyper, ShigEiFinder
Haemophilus	hicap, HpsuisSero
Klebsiella	Kleborate
Legionella	legsta
Listeria	LisSero
Mycobacterium	TBProfiler
Neisseria	meningotype, ngmaster
Pseudomonas	pasty
Salmonella	SeqSero2, SISTR
Staphylococcus	AgrVATE, spaTyper, staphopia-sccmec
Streptococcus	emmtyper, pbptyper, SsuisSero

Merlin is avialable as an independent Bactopia Tool, or in the Bactopia with the --ask_merlin parameter. Even better, if you want to force Merlin to execute all species-specific tools (no matter the distance), you can use --full_merlin. Then all the spells will be unleashed!

Output Overview¶

Below is the default output structure for the merlin step in Bactopia. Where possible the file descriptions below were modified from a tools description.

<BACTOPIA_DIR>
├── <SAMPLE_NAME>
│   └── tools
│       ├── agrvate
│       │   ├── <SAMPLE_NAME>-agr_gp.tab
│       │   ├── <SAMPLE_NAME>-blastn_log.txt
│       │   ├── <SAMPLE_NAME>-hmm-log.txt
│       │   ├── <SAMPLE_NAME>-hmm.tab
│       │   ├── <SAMPLE_NAME>-summary.tab
│       │   └── logs
│       │       ├── nf-agrvate.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── ectyper
│       │   ├── <SAMPLE_NAME>.tsv
│       │   ├── blast_output_alleles.txt
│       │   └── logs
│       │       ├── ectyper.log
│       │       ├── nf-ectyper.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── emmtyper
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf-emmtyper.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── genotyphi
│       │   ├── <SAMPLE_NAME>.csv
│       │   ├── <SAMPLE_NAME>.json
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── genotyphi
│       │       │   ├── nf-genotyphi.{begin,err,log,out,run,sh,trace}
│       │       │   └── versions.yml
│       │       └── mykrobe
│       │           ├── nf-genotyphi.{begin,err,log,out,run,sh,trace}
│       │           └── versions.yml
│       ├── hicap
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf-hicap.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── hpsuissero
│       │   ├── <SAMPLE_NAME>_serotyping_res.tsv
│       │   └── logs
│       │       ├── nf-hpsuissero.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── kleborate
│       │   ├── <SAMPLE_NAME>.results.txt
│       │   └── logs
│       │       ├── nf-kleborate.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── legsta
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf-legsta.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── mashdist
│       │   └── merlin
│       │       ├── <SAMPLE_NAME>-dist.txt
│       │       └── logs
│       │           ├── nf-mashdist.{begin,err,log,out,run,sh,trace}
│       │           └── versions.yml
│       ├── meningotype
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf-meningotype.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── ngmaster
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf-ngmaster.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── pasty
│       │   ├── <SAMPLE_NAME>.blastn.tsv
│       │   ├── <SAMPLE_NAME>.details.tsv
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf-pasty.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── pbptyper
│       │   ├── <SAMPLE_NAME>-1A.tblastn.tsv
│       │   ├── <SAMPLE_NAME>-2B.tblastn.tsv
│       │   ├── <SAMPLE_NAME>-2X.tblastn.tsv
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf-pbptyper.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── seqsero2
│       │   ├── <SAMPLE_NAME>_log.txt
│       │   ├── <SAMPLE_NAME>_result.tsv
│       │   ├── <SAMPLE_NAME>_result.txt
│       │   └── logs
│       │       ├── nf-seqsero2.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── seroba
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf-seroba.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── shigatyper
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf-shigatyper.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── shigeifinder
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf-shigeifinder.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── sistr
│       │   ├── <SAMPLE_NAME>-allele.fasta.gz
│       │   ├── <SAMPLE_NAME>-allele.json.gz
│       │   ├── <SAMPLE_NAME>-cgmlst.csv
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf-sistr.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── spatyper
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf-spatyper.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── ssuissero
│       │   ├── <SAMPLE_NAME>_serotyping_res.tsv
│       │   └── logs
│       │       ├── nf-ssuissero.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── staphopiasccmec
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf-staphopiasccmec.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── stecfinder
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf-stecfinder.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       └── tbprofiler
│           ├── <SAMPLE_NAME>.results.csv
│           ├── <SAMPLE_NAME>.results.json
│           ├── <SAMPLE_NAME>.results.txt
│           ├── bam
│           │   └── <SAMPLE_NAME>.bam
│           ├── logs
│           │   ├── nf-tbprofiler.{begin,err,log,out,run,sh,trace}
│           │   └── versions.yml
│           └── vcf
│               └── <SAMPLE_NAME>.targets.csq.vcf.gz
└── bactopia-runs
    └── merlin-<TIMESTAMP>
        ├── merged-results
        │   ├── agrvate.tsv
        │   ├── ectyper.tsv
        │   ├── emmtyper.tsv
        │   ├── genotyphi.tsv
        │   ├── hicap.tsv
        │   ├── hpsuissero.tsv
        │   ├── kleborate.tsv
        │   ├── legsta.tsv
        │   ├── logs
        │   │   └── <BACTOPIA_TOOL>-concat
        │   │       ├── nf-merged-results.{begin,err,log,out,run,sh,trace}
        │   │       └── versions.yml
        │   ├── meningotype.tsv
        │   ├── ngmaster.tsv
        │   ├── pasty.tsv
        │   ├── pbptyper.tsv
        │   ├── seqsero2.tsv
        │   ├── seroba.tsv
        │   ├── shigatyper.tsv
        │   ├── shigeifinder.tsv
        │   ├── sistr.tsv
        │   ├── spatyper.tsv
        │   ├── ssuissero.tsv
        │   ├── staphopiasccmec.tsv
        │   └── stecfinder.tsv
        └── nf-reports
            ├── merlin-dag.dot
            ├── merlin-report.html
            ├── merlin-timeline.html
            └── merlin-trace.txt

Directory structure might be different

Depending on the options used at runtime, the merlin directory structure might be different, but the output descriptions below still apply.

Results¶

Merged Results¶

Below are results that are concatenated into a single file.

Filename	Description
agrvate.tsv	A merged TSV file with `AgrVATE` results from all samples
clermontyping.csv	A merged TSV file with `ClermonTyping` results from all samples
ectyper.tsv	A merged TSV file with `ECTyper` results from all samples
emmtyper.tsv	A merged TSV file with `emmtyper` results from all samples
genotyphi.tsv	A merged TSV file with `genotyphi` results from all samples
hicap.tsv	A merged TSV file with `hicap` results from all samples
hpsuissero.tsv	A merged TSV file with `HpsuisSero` results from all samples
kleborate.tsv	A merged TSV file with `Kleborate` results from all samples
legsta.tsv	A merged TSV file with `legsta` results from all samples
lissero.tsv	A merged TSV file with `LisSero` results from all samples
meningotype.tsv	A merged TSV file with `meningotype` results from all samples
ngmaster.tsv	A merged TSV file with `ngmaster` results from all samples
pasty.tsv	A merged TSV file with `pasty` results from all samples
pbptyper.tsv	A merged TSV file with `pbptyper` results from all samples
seqsero2.tsv	A merged TSV file with `seqsero2` results from all samples
seroba.tsv	A merged TSV file with `seroba` results from all samples
shigapass.csv	A merged CSV file with `ShigaPass` results from all samples
shigatyper.tsv	A merged TSV file with `ShigaTyper` results from all samples
shigeifinder.tsv	A merged TSV file with `ShigEiFinder` results from all samples
sistr.tsv	A merged TSV file with `SISTR` results from all samples
spatyper.tsv	A merged TSV file with `spaTyper` results from all samples
ssuissero.tsv	A merged TSV file with `SsuisSero` results from all samples
staphopiasccmec.tsv	A merged TSV file with `staphopia-sccmec` results from all samples
stecfinder.tsv	A merged TSV file with `stecfinder` results from all samples

AgrVATE¶

Below is a description of the per-sample results from AgrVATE.

Extension	Description
-agr_gp.tab	A detailed report for agr kmer matches
-blastn_log.txt	Log files from programs called by `AgrVATE`
-summary.tab	A final summary report for agr typing

ClermonTyping¶

Below is a description of the per-sample results from ClermonTyping.

Extension	Description
<SAMPLE_NAME>.blast.xml	A BLAST XML file with the results of the ClermonTyping analysis
<SAMPLE_NAME>.html	A HTML file with the results of the ClermonTyping analysis
<SAMPLE_NAME>.mash.tsv	A TSV file with the Mash distances
<SAMPLE_NAME>.phylogroups.txt	A TSV file with the final phylogroup assignments

ECTyper¶

Below is a description of the per-sample results from ECTyper.

Filename	Description
<SAMPLE_NAME>.tsv	A tab-delimited file with `ECTyper` result, see ECTyper - Report format for details
blast_output_alleles.txt	Allele report generated from BLAST results

emmtyper¶

Below is a description of the per-sample results from emmtyper.

Filename	Description
<SAMPLE_NAME>.tsv	A tab-delimited file with `emmtyper` result, see emmtyper - Result format for details

hicap¶

Below is a description of the per-sample results from hicap.

Filename	Description
<SAMPLE_NAME>.gbk	GenBank file and cap locus annotations
<SAMPLE_NAME>.svg	Visualization of annotated cap locus
<SAMPLE_NAME>.tsv	A tab-delimited file with `hicap` results

HpsuisSero¶

Below is a description of the per-sample results from HpsuisSero.

Filename	Description
<SAMPLE_NAME>_serotyping_res.tsv	A tab-delimited file with `HpsuisSero` result

GenoTyphi¶

Below is a description of the per-sample results from GenoTyphi. A full description of the GenoTyphi output is available at GenoTyphi - Output

Filename	Description
<SAMPLE_NAME>_predictResults.tsv	A tab-delimited file with `GenoTyphi` results
<SAMPLE_NAME>.csv	The output of `mykrobe predict` in comma-separated format
<SAMPLE_NAME>.json	The output of `mykrobe predict` in JSON format

Kleborate¶

Below is a description of the per-sample results from Kleborate.

Filename	Description
<SAMPLE_NAME>.results.txt	A tab-delimited file with `Kleborate` result, see Kleborate - Example output for more details.

legsta¶

Below is a description of the per-sample results from legsta.

Filename	Description
<SAMPLE_NAME>.tsv	A tab-delimited file with `legsta` result, see legsta - Output for more details

LisSero¶

Below is a description of the per-sample results from LisSero.

Filename	Description
<SAMPLE_NAME>.tsv	A tab-delimited file with `LisSero` results

Mash¶

Below is a description of the per-sample results from Mash.

Filename	Description
<SAMPLE_NAME>-dist.txt	A tab-delimited file with `mash dist` results

meningotype¶

Below is a description of the per-sample results from meningotype .

Filename	Description
<SAMPLE_NAME>.tsv	A tab-delimited file with `meningotype` result

ngmaster¶

Below is a description of the per-sample results from ngmaster.

Filename	Description
<SAMPLE_NAME>.tsv	A tab-delimited file with `ngmaster` results

pasty¶

Below is a description of the per-sample results from pasty.

Extension	Description
.blastn.tsv	A tab-delimited file of all blast hits
.details.tsv	A tab-delimited file with details for each serogroup
.tsv	A tab-delimited file with the predicted serogroup

pbptyper¶

Below is a description of the per-sample results from pbptyper.

Extension	Description
.tblastn.tsv	A tab-delimited file of all blast hits
.tsv	A tab-delimited file with the predicted PBP type

SeqSero2¶

Below is a description of the per-sample results from SeqSero2.

Filename	Description
<SAMPLE_NAME>_result.tsv	A tab-delimited file with `SeqSero2` results
<SAMPLE_NAME>_result.txt	A text file with key-value pairs of `SeqSero2` results

Seroba¶

Below is a description of the per-sample results from Seroba. More details about the outputs are available from Seroba - Output.

Filename	Description
<SAMPLE_NAME>.tsv	A tab-delimited file with the predicted serotype
detailed_serogroup_info.txt	Detailed information about the predicted results

ShigaPass¶

Below is a description of the per-sample results from ShigaPass.

Extension	Description
<SAMPLE_NAME>.csv	A CSV file with the predicted Shigella or EIEC serotype

ShigaTyper¶

Below is a description of the per-sample results from ShigaTyyper.

Filename	Description
<SAMPLE_NAME>-hits.tsv	Detailed statistics about each individual gene hit
<SAMPLE_NAME>.tsv	The final predicted serotype by `ShigaTyper`

ShigEiFinder¶

Below is a description of the per-sample results from ShigEiFinder.

Extension	Description
<SAMPLE_NAME>.tsv	A tab-delimited file with the predicted Shigella or EIEC serotype

SISTR¶

Below is a description of the per-sample results from SISTR.

Filename	Description
<SAMPLE_NAME>-allele.fasta.gz	A FASTA file of the cgMLST allele search results
<SAMPLE_NAME>-allele.json.gz	JSON formated cgMLST allele search results, see SISTR - cgMLST search results for more details
<SAMPLE_NAME>-cgmlst.csv	A comma-delimited summary of the cgMLST allele search results
<SAMPLE_NAME>.tsv	A tab-delimited file with `SISTR` results, see SISTR - Primary results for more details

spaTyper¶

Below is a description of the per-sample results from spaTyper.

Filename	Description
<SAMPLE_NAME>.tsv	A tab-delimited file with `spaTyper` result

SsuisSero¶

Below is a description of the per-sample results from SsuisSero.

Filename	Description
<SAMPLE_NAME>_serotyping_res.tsv	A tab-delimited file with `SsuisSero` results

staphopia-sccmec¶

Below is a description of the per-sample results from staphopia-sccmec.

Filename	Description
<SAMPLE_NAME>.tsv	A tab-delimited file with `staphopia-sccmec` results

TBProfiler¶

Below is a description of the per-sample results from TBProfiler.

Filename	Description
<SAMPLE_NAME>.results.csv	A CSV formated `TBProfiler` result file of resistance and strain type
<SAMPLE_NAME>.results.json	A JSON formated `TBProfiler` result file of resistance and strain type
<SAMPLE_NAME>.results.txt	A text file with `TBProfiler` results
<SAMPLE_NAME>.bam	BAM file with alignment details
<SAMPLE_NAME>.targets.csq.vcf.gz	VCF with variant info again reference genomes

Audit Trail¶

Below are files that can assist you in understanding which parameters and program versions were used.

Logs¶

Each process that is executed will have a folder named logs. In this folder are helpful files for you to review if the need ever arises.

Extension	Description
.begin	An empty file used to designate the process started
.err	Contains STDERR outputs from the process
.log	Contains both STDERR and STDOUT outputs from the process
.out	Contains STDOUT outputs from the process
.run	The script Nextflow uses to stage/unstage files and queue processes based on given profile
.sh	The script executed by bash for the process
.trace	The Nextflow Trace report for the process
versions.yml	A YAML formatted file with program versions

Parameters¶

mashdist¶

Parameter	Description
`--mash_sketch`	The reference sequence as a Mash Sketch (.msh file) Type: `string`
`--mash_seed`	Seed to provide to the hash function Type: `integer`, Default: `42`
`--mash_table`	Table output (fields will be blank if they do not meet the p-value threshold) Type: `boolean`
`--mash_m`	Minimum copies of each k-mer required to pass noise filter for reads Type: `integer`, Default: `1`
`--mash_w`	Probability threshold for warning about low k-mer size. Type: `number`, Default: `0.01`
`--max_p`	Maximum p-value to report. Type: `number`, Default: `1.0`
`--max_dist`	Maximum distance to report. Type: `number`, Default: `1.0`
`--merlin_dist`	Maximum distance to report when using Merlin . Type: `number`, Default: `0.1`
`--full_merlin`	Go full Merlin and run all species-specific tools, no matter the Mash distance Type: `boolean`
`--use_fastqs`	Query with FASTQs instead of the assemblies Type: `boolean`