Documents

Comprehensive documentation for KOSMOS2 platform including terminology, analysis pipelines, and data descriptions

v1.0.0

January 26, 2026 - Initial Release

Database Statistics

• Institutions: 27 hospitals

• Samples: 462 patient samples

• NGS: 14 Panels

• Mutations (SNV/Indel): 7,294 variants across 1,026 genes

• Copy Number Variations (CNV): 335 CNV events

• Gene Fusions: 6 fusion events

Annotation Database

• Reference: hg19 (GRCh37)

• VEP: v115

• COSMIC: v103, released 18-Nov-2025

Precision Medicine
A medical approach that uses individual patient's genomic information to guide treatment decisions. The KOSMOS study aims to match patients with targeted therapies based on their specific genetic alterations.

Sample
A biological specimen collected from a patient, typically tumor tissue, used for genomic analysis. Each sample contains clinical metadata including patient demographics and treatment information.

eCRF (electronic Case Report Form)
A digital system for collecting and managing clinical trial data. In KOSMOS study, eCRF is used as a centralized data collection system for patient clinical information and treatment outcomes.

MTB (Molecular Tumor Board)
A multidisciplinary team of experts including medical oncologists, molecular pathologists, and bioinformaticians who review patients' NGS results and clinical information to recommend optimal treatment strategies.

NGS (Next-Generation Sequencing)
High-throughput DNA sequencing technology that enables rapid sequencing of entire genomes or targeted gene panels. NGS is fundamental to identifying somatic mutations in cancer patients for precision medicine.

Treatment Tier System
KOSMOS study classification for treatment recommendations: Tier 1 (Investigational Product), Tier 2 (Conventional treatment or Off-label use), Tier 3A (Investigator-Initiated Trials), and Tier 3B (Other clinical trials accessible in Korea).

Variant
A genomic alteration or mutation in DNA sequence that differs from the reference genome. Variants can be single nucleotide changes (SNVs), insertions, deletions, or structural changes.

Variant Consequence
The predicted effect of a variant on gene function, such as missense, nonsense, frameshift, splice site, or synonymous mutations.

Variant Impact
The predicted severity of a variant's effect on protein function, typically classified as HIGH, MODERATE, LOW, or MODIFIER.

VAF (Variant Allele Frequency)
The proportion of sequencing reads that support a variant allele at a given genomic position. VAF can indicate clonality and tumor purity.

ECOG (Eastern Cooperative Oncology Group) Performance Status
A scale used to assess how a patient's disease affects their daily living abilities. Scale ranges from 0 (fully active) to 5 (dead).

RECIST (Response Evaluation Criteria In Solid Tumors)
A set of published rules that define when cancer patients improve (respond), stay the same (stabilize), or worsen (progress) during treatments. Common responses include CR (Complete Response), PR (Partial Response), SD (Stable Disease), and PD (Progressive Disease).

OS (Overall Survival)
The length of time from diagnosis or treatment start until death from any cause. OS is a key endpoint in cancer clinical trials.

PFS (Progression-Free Survival)
The length of time during and after treatment that a patient lives with the disease without it getting worse. PFS measures both survival and disease progression.

KOSMOS2 employs a comprehensive genomic analysis pipeline to process and analyze cancer genomic data:

1. Sequencing & Variant Calling

Tumor-only somatic variant analysis using targeted panel sequencing, aligned to GRCh37 (hg19) reference genome.

Panel Types:

TSO500 ver2.2

SNUBH Pan_Cancer Ver2

K-MASTER PANEL V1.1

AL100_V2

CancerSCAN v2.2

BrainTumorSCAN v2

CancerSCAN Level.2

Solid Tumor Panel II (TMB/MSI)

OncoPanel AMC v4.5

Oncomine Comprehensive Assay Plus

NGS Pan cancer panel 525 ver.3

CancerSCAN compact

Illumina TruSight Oncology 500 ctDNA

ONCOaccuPanel v4.3

ONCOaccuPanel v4.5

ONCOaccuPanel v1.0

2. Variant Filtering (VCF Processing)

• Genotype (GT) extraction and DP > 100 filtering
• Multi-allelic variants split into separate records
• VAF (Variant Allele Frequency) calculation
• VAF ≥ 0.05 filtering threshold

3. Variant Annotation (VEP)

Annotation using Ensembl VEP (Variant Effect Predictor) with the following criteria:

Transcript Selection
Canonical transcript only (Canonical = Yes)
Population Frequency Filtering
• gnomAD_exon_EAS_AF < 0.001
• gnomAD_genome_EAS_AF < 0.001
• MAX_AF < 0.01
Splice Site Prediction
SpliceAI cutoff ≥ 0.5 included
Clinical Significance
ClinVar: Exclude Benign variants
Biotype
Protein coding genes only
COSMIC Filtering
Remove variants with COSMIC Germline = Yes and Somatic = No
Consequence Selection
missense_variant
stop_gained
stop_lost
start_lost
frameshift_variant
inframe_deletion
inframe_insertion
splice_acceptor_variant
splice_donor_variant

4. Statistical Analysis

• Kaplan-Meier survival analysis (OS and PFS)
• Log-rank test for group comparison

5. Visualization & Reporting

• OncoPlot generation for mutation landscape
• Lollipop plots for protein-level mutations
• Kaplan-Meier survival curves
• Disco Plot for individual sample genomic overview (SNV/Indel, CNV, Fusion on circular chromosome layout)
• Interactive data exploration interface

KOSMOS2 integrates comprehensive genomic and clinical data:

Sample Data

Sample ID

Gender

Age at Diagnosis

Tumor Tissue Site

Pathologic Stage

Therapy Type

Smoking Status

ECOG Score

RECIST Response

Survival Data

OS Status

OS Time (months)

PFS Status

PFS Time (months)

Status: 0 = censored (alive or no progression), 1 = event (death or progression)

Mutation Data & Annotation

Each variant is annotated with multiple prediction tools and databases:

Basic Information
Gene Symbol, Chromosome, Position, Reference/Alternate Allele, Variant Type (SNP, INS, DEL)
Consequence (Variant Effect)
Predicted effect of variant on gene/transcript:
• missense_variant: Amino acid changed to different amino acid
• stop_gained: Premature stop codon created (nonsense mutation)
• stop_lost: Stop codon removed, protein extension
• start_lost: Start codon removed
• frameshift_variant: Insertion/deletion causing reading frame shift
• inframe_deletion: Deletion preserving reading frame
• inframe_insertion: Insertion preserving reading frame
• splice_acceptor_variant: Variant in splice acceptor site (3' end of intron)
• splice_donor_variant: Variant in splice donor site (5' end of intron)
Impact (Severity)
Predicted severity of variant effect on protein function:
• HIGH: Likely loss of function (frameshift, stop_gained, splice variants)
• MODERATE: Possible functional change (missense, inframe indels)
• LOW: Unlikely to affect function significantly (synonymous)
• MODIFIER: Non-coding or intergenic variants
HGVS Notation
HGVSc (coding DNA change, e.g., c.1234A>G), HGVSp (protein change, e.g., p.Val412Met) - Standardized nomenclature for describing variants
SIFT (Sorting Intolerant From Tolerant)
Predicts whether an amino acid substitution affects protein function based on sequence homology.
• Tolerated: Score ≥ 0.05 (high confidence)
• Tolerated Low Confidence: Score ≥ 0.05 (low confidence)
• Deleterious: Score < 0.05 (high confidence)
• Deleterious Low Confidence: Score < 0.05 (low confidence)
• Unknown: No prediction available
PolyPhen-2 (Polymorphism Phenotyping v2)
Predicts the impact of amino acid substitutions on protein structure and function.
• Benign: Score < 0.15 (likely neutral)
• Possibly Damaging: Score 0.15-0.85 (potential functional impact)
• Probably Damaging: Score > 0.85 (likely deleterious)
• Unknown: No prediction available
REVEL (Rare Exome Variant Ensemble Learner)
Ensemble method combining multiple tools to predict pathogenicity of missense variants.
• Benign: Score ≤ 0.644 (likely neutral)
• Pathogenic: Score > 0.644 (likely disease-causing)
• Unknown: No prediction available
SpliceAI
Deep learning-based tool that predicts splicing alterations by evaluating delta scores for donor/acceptor gain and loss. Variants with any delta score ≥ 0.5 are marked as PASS, indicating a high likelihood of affecting splicing.
dbSNP (Database of Single Nucleotide Polymorphisms)
NCBI database cataloging known genetic variations (rs IDs). Presence indicates previously reported variant in general population
COSMIC (Catalogue Of Somatic Mutations In Cancer)
Comprehensive database of somatic mutations in cancer. COSMIC ID indicates variant has been observed in cancer samples
ClinVar
NCBI database of clinically significant variants. Classifications: Pathogenic, Likely Pathogenic, Uncertain Significance, Likely Benign, Benign

Data Privacy & Security

All patient data is de-identified in accordance with institutional review board (IRB) protocols. The platform implements role-based access control to ensure data security and compliance with privacy standards.

Log-Rank Test

Statistical test to compare survival distributions between two or more groups. A p-value < 0.05 indicates statistically significant difference in survival between groups. The platform requires minimum 10 samples per group to calculate log-rank test statistics.

Mutation Frequency Analysis

Calculation of mutation prevalence across samples and genes. Includes identification of significantly mutated genes and hotspot mutations within protein domains.

This documentation is continuously updated to reflect the latest features and methodologies implemented in KOSMOS2. For additional questions or clarifications, please contact the support team.

Documents

Release Notes

Database Statistics

Annotation Database

Terminology

Precision Medicine

Sample

eCRF (electronic Case Report Form)

MTB (Molecular Tumor Board)

NGS (Next-Generation Sequencing)

Treatment Tier System

Variant

Variant Consequence

Variant Impact

VAF (Variant Allele Frequency)

ECOG (Eastern Cooperative Oncology Group) Performance Status

RECIST (Response Evaluation Criteria In Solid Tumors)

OS (Overall Survival)

PFS (Progression-Free Survival)

Analysis Pipeline

Analysis Pipeline

1. Sequencing & Variant Calling

2. Variant Filtering (VCF Processing)

3. Variant Annotation (VEP)

Transcript Selection

Population Frequency Filtering

Splice Site Prediction

Clinical Significance

Biotype

COSMIC Filtering

Consequence Selection

4. Statistical Analysis

5. Visualization & Reporting

Data Description

Data Description

Sample Data

Survival Data

Mutation Data & Annotation

Basic Information

Consequence (Variant Effect)

Impact (Severity)

HGVS Notation

SIFT (Sorting Intolerant From Tolerant)

PolyPhen-2 (Polymorphism Phenotyping v2)

REVEL (Rare Exome Variant Ensemble Learner)

SpliceAI

dbSNP (Database of Single Nucleotide Polymorphisms)

COSMIC (Catalogue Of Somatic Mutations In Cancer)

ClinVar

Data Privacy & Security

Statistical Methods

Statistical Methods

Log-Rank Test

Mutation Frequency Analysis