KOSMOS-II
AboutExploreDocumentsContact
Sign In

Documents

Comprehensive documentation for KOSMOS2 platform including terminology, analysis pipelines, and data descriptions

v1.0.0

January 26, 2026 - Initial Release


Database Statistics

Institutions: 27 hospitals

Samples: 462 patient samples

NGS: 14 Panels

Mutations (SNV/Indel): 7,294 variants across 1,026 genes

Copy Number Variations (CNV): 335 CNV events

Gene Fusions: 6 fusion events

Annotation Database

Reference: hg19 (GRCh37)

VEP: v115

COSMIC: v103, released 18-Nov-2025

  • Precision Medicine

    A medical approach that uses individual patient's genomic information to guide treatment decisions. The KOSMOS study aims to match patients with targeted therapies based on their specific genetic alterations.


  • Sample

    A biological specimen collected from a patient, typically tumor tissue, used for genomic analysis. Each sample contains clinical metadata including patient demographics and treatment information.


  • eCRF (electronic Case Report Form)

    A digital system for collecting and managing clinical trial data. In KOSMOS study, eCRF is used as a centralized data collection system for patient clinical information and treatment outcomes.


  • MTB (Molecular Tumor Board)

    A multidisciplinary team of experts including medical oncologists, molecular pathologists, and bioinformaticians who review patients' NGS results and clinical information to recommend optimal treatment strategies.


  • NGS (Next-Generation Sequencing)

    High-throughput DNA sequencing technology that enables rapid sequencing of entire genomes or targeted gene panels. NGS is fundamental to identifying somatic mutations in cancer patients for precision medicine.


  • Treatment Tier System

    KOSMOS study classification for treatment recommendations: Tier 1 (Investigational Product), Tier 2 (Conventional treatment or Off-label use), Tier 3A (Investigator-Initiated Trials), and Tier 3B (Other clinical trials accessible in Korea).


  • Variant

    A genomic alteration or mutation in DNA sequence that differs from the reference genome. Variants can be single nucleotide changes (SNVs), insertions, deletions, or structural changes.


  • Variant Consequence

    The predicted effect of a variant on gene function, such as missense, nonsense, frameshift, splice site, or synonymous mutations.


  • Variant Impact

    The predicted severity of a variant's effect on protein function, typically classified as HIGH, MODERATE, LOW, or MODIFIER.


  • VAF (Variant Allele Frequency)

    The proportion of sequencing reads that support a variant allele at a given genomic position. VAF can indicate clonality and tumor purity.


  • ECOG (Eastern Cooperative Oncology Group) Performance Status

    A scale used to assess how a patient's disease affects their daily living abilities. Scale ranges from 0 (fully active) to 5 (dead).


  • RECIST (Response Evaluation Criteria In Solid Tumors)

    A set of published rules that define when cancer patients improve (respond), stay the same (stabilize), or worsen (progress) during treatments. Common responses include CR (Complete Response), PR (Partial Response), SD (Stable Disease), and PD (Progressive Disease).


  • OS (Overall Survival)

    The length of time from diagnosis or treatment start until death from any cause. OS is a key endpoint in cancer clinical trials.


  • PFS (Progression-Free Survival)

    The length of time during and after treatment that a patient lives with the disease without it getting worse. PFS measures both survival and disease progression.

KOSMOS2 employs a comprehensive genomic analysis pipeline to process and analyze cancer genomic data:

1. Sequencing & Variant Calling

Tumor-only somatic variant analysis using targeted panel sequencing, aligned to GRCh37 (hg19) reference genome.

Panel Types:

TSO500 ver2.2
SNUBH Pan_Cancer Ver2
K-MASTER PANEL V1.1
AL100_V2
CancerSCAN v2.2
BrainTumorSCAN v2
CancerSCAN Level.2
Solid Tumor Panel II (TMB/MSI)
OncoPanel AMC v4.5
Oncomine Comprehensive Assay Plus
NGS Pan cancer panel 525 ver.3
CancerSCAN compact
Illumina TruSight Oncology 500 ctDNA
ONCOaccuPanel v4.3
ONCOaccuPanel v4.5
ONCOaccuPanel v1.0
2. Variant Filtering (VCF Processing)
  • • Genotype (GT) extraction and DP > 100 filtering

  • • Multi-allelic variants split into separate records

  • • VAF (Variant Allele Frequency) calculation

  • • VAF ≥ 0.05 filtering threshold

3. Variant Annotation (VEP)

Annotation using Ensembl VEP (Variant Effect Predictor) with the following criteria:

  • Transcript Selection

    Canonical transcript only (Canonical = Yes)

  • Population Frequency Filtering

    • gnomAD_exon_EAS_AF < 0.001
    • gnomAD_genome_EAS_AF < 0.001
    • MAX_AF < 0.01

  • Splice Site Prediction

    SpliceAI cutoff ≥ 0.5 included

  • Clinical Significance

    ClinVar: Exclude Benign variants

  • Biotype

    Protein coding genes only

  • COSMIC Filtering

    Remove variants with COSMIC Germline = Yes and Somatic = No

  • Consequence Selection
    missense_variant
    stop_gained
    stop_lost
    start_lost
    frameshift_variant
    inframe_deletion
    inframe_insertion
    splice_acceptor_variant
    splice_donor_variant
4. Statistical Analysis
  • • Kaplan-Meier survival analysis (OS and PFS)

  • • Log-rank test for group comparison

5. Visualization & Reporting
  • • OncoPlot generation for mutation landscape

  • • Lollipop plots for protein-level mutations

  • • Kaplan-Meier survival curves

  • • Disco Plot for individual sample genomic overview (SNV/Indel, CNV, Fusion on circular chromosome layout)

  • • Interactive data exploration interface

KOSMOS2 integrates comprehensive genomic and clinical data:

Sample Data
Sample ID
Gender
Age at Diagnosis
Tumor Tissue Site
Pathologic Stage
Therapy Type
Smoking Status
ECOG Score
RECIST Response
Survival Data
OS Status
OS Time (months)
PFS Status
PFS Time (months)

Status: 0 = censored (alive or no progression), 1 = event (death or progression)

Mutation Data & Annotation

Each variant is annotated with multiple prediction tools and databases:

  • Basic Information

    Gene Symbol, Chromosome, Position, Reference/Alternate Allele, Variant Type (SNP, INS, DEL)

  • Consequence (Variant Effect)

    Predicted effect of variant on gene/transcript:
    missense_variant: Amino acid changed to different amino acid
    stop_gained: Premature stop codon created (nonsense mutation)
    stop_lost: Stop codon removed, protein extension
    start_lost: Start codon removed
    frameshift_variant: Insertion/deletion causing reading frame shift
    inframe_deletion: Deletion preserving reading frame
    inframe_insertion: Insertion preserving reading frame
    splice_acceptor_variant: Variant in splice acceptor site (3' end of intron)
    splice_donor_variant: Variant in splice donor site (5' end of intron)

  • Impact (Severity)

    Predicted severity of variant effect on protein function:
    HIGH: Likely loss of function (frameshift, stop_gained, splice variants)
    MODERATE: Possible functional change (missense, inframe indels)
    LOW: Unlikely to affect function significantly (synonymous)
    MODIFIER: Non-coding or intergenic variants

  • HGVS Notation

    HGVSc (coding DNA change, e.g., c.1234A>G), HGVSp (protein change, e.g., p.Val412Met) - Standardized nomenclature for describing variants

  • SIFT (Sorting Intolerant From Tolerant)

    Predicts whether an amino acid substitution affects protein function based on sequence homology.
    Tolerated: Score ≥ 0.05 (high confidence)
    Tolerated Low Confidence: Score ≥ 0.05 (low confidence)
    Deleterious: Score < 0.05 (high confidence)
    Deleterious Low Confidence: Score < 0.05 (low confidence)
    Unknown: No prediction available

  • PolyPhen-2 (Polymorphism Phenotyping v2)

    Predicts the impact of amino acid substitutions on protein structure and function.
    Benign: Score < 0.15 (likely neutral)
    Possibly Damaging: Score 0.15-0.85 (potential functional impact)
    Probably Damaging: Score > 0.85 (likely deleterious)
    Unknown: No prediction available

  • REVEL (Rare Exome Variant Ensemble Learner)

    Ensemble method combining multiple tools to predict pathogenicity of missense variants.
    Benign: Score ≤ 0.644 (likely neutral)
    Pathogenic: Score > 0.644 (likely disease-causing)
    Unknown: No prediction available

  • SpliceAI

    Deep learning-based tool that predicts splicing alterations by evaluating delta scores for donor/acceptor gain and loss. Variants with any delta score ≥ 0.5 are marked as PASS, indicating a high likelihood of affecting splicing.

  • dbSNP (Database of Single Nucleotide Polymorphisms)

    NCBI database cataloging known genetic variations (rs IDs). Presence indicates previously reported variant in general population

  • COSMIC (Catalogue Of Somatic Mutations In Cancer)

    Comprehensive database of somatic mutations in cancer. COSMIC ID indicates variant has been observed in cancer samples

  • ClinVar

    NCBI database of clinically significant variants. Classifications: Pathogenic, Likely Pathogenic, Uncertain Significance, Likely Benign, Benign

Data Privacy & Security

All patient data is de-identified in accordance with institutional review board (IRB) protocols. The platform implements role-based access control to ensure data security and compliance with privacy standards.

Log-Rank Test

Statistical test to compare survival distributions between two or more groups. A p-value < 0.05 indicates statistically significant difference in survival between groups. The platform requires minimum 10 samples per group to calculate log-rank test statistics.

Mutation Frequency Analysis

Calculation of mutation prevalence across samples and genes. Includes identification of significantly mutated genes and hotspot mutations within protein domains.

This documentation is continuously updated to reflect the latest features and methodologies implemented in KOSMOS2. For additional questions or clarifications, please contact the support team.