How to use GI-DB and interpret GenomeIndia variant data
Search modes, result pages, and annotations
This page describes the Genome India Database (GI-DB): the underlying project, data production and pipeline, variant annotation, population structure, and how to use the browser and API.
GI-DB (Genome India Database) is a public resource that aggregates and serves allele frequencies and annotations for genetic variants from the Genome India Project. It provides a searchable catalogue of variants across the Indian population, with population-specific and overall frequencies, functional annotations, and links to external databases. The resource is intended for researchers and clinicians interested in population genetics, rare variant interpretation, and precision medicine in the Indian context.
The Genome India project is a national initiative funded by the Department of Biotechnology (DBT), Government of India, launched in January 2020. Its goal is to sequence genomes from healthy Indian individuals representing diverse population groups across the country.
This design ensures the database captures genetic diversity representative of India and supports the identification of population-specific and rare variants.
GI-DB follows best practices for variant calling and quality control, analogous to those used in large-scale resources such as gnomAD.
All data are aligned and called against the GRCh38/hg38 reference genome. Processing is performed using the DRAGEN (Dynamic Read Analysis for GENomics) pipeline for alignment, duplicate marking, and variant calling. Both single-nucleotide variants (SNVs) and short indels are included, with allele counts and frequencies computed across the full cohort.
Each variant in GI-DB is annotated to support interpretation and filtering.
Variants are annotated with predicted functional consequence (e.g. synonymous, missense, loss-of-function) and gene/transcript context. Consequence types in the database include, among others:
For each variant, the database stores:
These fields are shown in the variant pages and are available via the API for gene and region queries.
The cohort is structured into 83 population groups, reflecting geographic, linguistic, and ethnic diversity in India. Frequencies can be aggregated overall or by population, enabling:
Population labels and sample sizes are described in the project publications and may be summarized in the browser or in downloadable metadata.
Summary statistics for the current release are available on the Stats page and give an overview of the resource scale.
Rough distribution of variant consequences (e.g. intergenic, intronic, missense, synonymous, etc.) and exonic variant types (synonymous, missense, nonsense, frameshift, etc.) are provided to illustrate the composition of the dataset.
Exact numbers may be updated with new releases; refer to the Stats page and publication for current figures.
The GI-DB website allows you to search and explore variants by gene, variant, region, or rsID.
| Query type | Example | Description |
|---|---|---|
| Gene | BRCA1 | Returns variants overlapping the gene (by symbol). |
| Variant | chr7:117504290-C-T | Exact variant by chromosome, position, ref, alt. |
| Region | chr22:23727262-23777262 | All variants in the given genomic interval. |
| rsID | rs1000000 | Variant by dbSNP identifier. |
After searching, you can open a variant to see:
Programmatic access is provided via the GI-DB API. Supported query types include:
Responses include variant identifiers, coordinates, allele frequencies, and annotations. See the API documentation for endpoints, parameters, and rate limits.
The flagship manuscript describing the Genome India cohort, pipeline, and variant catalogue will be published soon. Once available, please cite that publication when using GI-DB or Genome India data.
For the database and web resource, please acknowledge: GI-DB – Genome India Database (https://gidb.igib.res.in / maintained by CSIR-IGIB).