csuite
Welcome to csuite’s documentation!
csuite: Streamlined workflows for query-based gene cluster mining.
The csuite provides easy-to-use one-go commands for query-based gene cluster mining. It wraps several mining and processing tools into end-to-end workflows, removing the sizeable file and settings plumbing overhead.
- The csuite bundles the following tools
cblaster - sequence similarity-based gene cluster mining
cfoldseeker - protein structure similarity-based gene cluster mining
CAGEcleaner - hit redundancy removal with respect for both gene cluster and host diversity
clinker - interactive gene cluster visualisations.
Tip
The csuite workflows are similar in design philosophy as MMseqs2’s and FoldSeek’s easy-* combo commands!
For more fine-grained settings, you can still run the member tools of the csuite separately. You can just call them inside csuite’s conda environment or Docker container.
The csuite currently supports the workflows listed below.
local_struc_derep: Local structure-based search with hit dereplicationLocal context database construction using
cfoldseeker-cdsStructure-based search in local mode using
cfoldseekerHit dereplication in local mode using
CAGEcleaner
local_struc: Local structure-based searchLocal context database construction using
cfoldseeker-cdsStructure-based search in local mode using
cfoldseeker
remote_struc_derep: Remote structure-based search with hit dereplicationStructure-based search in remote mode using
cfoldseekerHit dereplication in remote mode using
CAGEcleaner
remote_struc: Remote structure-based searchStructure-based search in remote mode using
cfoldseeker
local_seq_derep: Local sequence-based search with hit dereplicationSequence database construction using
cblaster makedbSequence-based search in local mode using
cblaster searchHit dereplication in local mode using
CAGEcleaner
local_seq: Local sequence-based searchSequence database construction using
cblaster makedbSequence-based search in local mode using
cblaster search
remote_seq_derep: Local sequence-based search with hit dereplicationSequence-based search in remote mode using
cblaster searchHit dereplication in remote mode using
CAGEcleaner
remote_seq: Remote sequence-based searchSequence-based search in remote mode using
cblaster search
derep: Hit dereplicationHit dereplication using
CAGEcleaner(unified interface for all search modes)
report: Generate reports for an existing sessionGenerating summary, plot and binary table files using
cblasterAlignment visualisation using
clinker
remote_extract: Export cluster Genbank files from a remote mode search sessionGenerate cluster Genbank files for an existing session using
cblaster extract_clusters
local_extract: Export cluster Genbank files from a local mode search sessionGenerate cluster Genbank files for an existing session using
cfoldseeker-seqs
References and citations
If you find csuite useful, please cite our manuscript:
In preparation
Our member tools rely heavily on the following tools, so please give these proper credit as well:
Gilchrist, C.L.M., Booth, T.J., van Wersch, B., van Grieken, L., Medema, M.H., & Chooi, Y-H. (2021). cblaster: a remote search tool for rapid identification and visualisation of homologous gene clusters. Bioinformatics Advances, https://doi.org/10.1093/bioadv/vbab016
van Kempen, M., Kim, S.S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C.L.M., Söding, J., Steinegger, M. (2024). Fast and accurate protein structure search with Foldseek. Nature Biotechnology, 42, https://doi.org/10.1038/s41587-023-01773-0
Steinegger, M., & Söding, J. (2017). MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, 35, https://doi.org/10.1038/nbt.3988
Huckvale, E., Moseley, H.N.B. (2023). kegg_pull: a software package for the RESTful access and pulling from the Kyoto Encyclopedia of Gene and Genomes. BMC Bioinformatics, 24(78), https://doi.org/10.1186/s12859-023-05208-0
Salamzade, R., & Kalan, L. R. (2025). skDER and CiDDER: two scalable approaches for microbial genome dereplication. Microbial Genomics, 11(7), https://doi.org/10.1099/mgen.0.001438
Shaw, J., & Yu, Y. W. (2023). Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nature Methods, 20(11), 1661–1665. https://doi.org/10.1038/s41592-023-02018-3
User Guide
API Reference