Global Microbial smORFs Catalogue v1.0

The global microbial smORF catalogue (GMSC) is an integrated, consistently-processed, smORFs catalogue of the microbial world, combining publicly available metagenomes and high-quality isolated microbial genomes. A total of non-redundant ~965 million 100AA ORFs were predicted from 63,410 metagenomes across global habitats from the SPIRE database and 87,920 high-quality isolated microbial genomes from the ProGenomes2 database. The smORFs were clustered at 90% amino acid identity resulting in ~288 million 90AA smORFs families.

  • The annotation of GMSC contains:
    • taxonomy classification
    • habitat assignment
    • quality assessment
    • conserved domain annotation
    • cellular localization prediction

For more information, see (Duan et al., 2024).

Copyright (c) 2023-2024 GMSC authors. All rights reserved.