Documentation

How to use GI-DB and interpret GenomeIndia variant data

Search modes, result pages, and annotations

GI-DB Documentation

This page describes the Genome India Database (GI-DB): the underlying project, data production and pipeline, variant annotation, population structure, and how to use the browser and API.

About GI-DB

GI-DB (Genome India Database) is a public resource that aggregates and serves allele frequencies and annotations for genetic variants from the Genome India Project. It provides a searchable catalogue of variants across the Indian population, with population-specific and overall frequencies, functional annotations, and links to external databases. The resource is intended for researchers and clinicians interested in population genetics, rare variant interpretation, and precision medicine in the Indian context.

The Genome India Project

The Genome India project is a national initiative funded by the Department of Biotechnology (DBT), Government of India, launched in January 2020. Its goal is to sequence genomes from healthy Indian individuals representing diverse population groups across the country.

Sample design

This design ensures the database captures genetic diversity representative of India and supports the identification of population-specific and rare variants.

Data production and pipeline

GI-DB follows best practices for variant calling and quality control, analogous to those used in large-scale resources such as gnomAD.

Reference genome and pipeline

All data are aligned and called against the GRCh38/hg38 reference genome. Processing is performed using the DRAGEN (Dynamic Read Analysis for GENomics) pipeline for alignment, duplicate marking, and variant calling. Both single-nucleotide variants (SNVs) and short indels are included, with allele counts and frequencies computed across the full cohort.

Quality control

Variant annotation

Each variant in GI-DB is annotated to support interpretation and filtering.

Identifiers and location

Functional annotation

Variants are annotated with predicted functional consequence (e.g. synonymous, missense, loss-of-function) and gene/transcript context. Consequence types in the database include, among others:

Frequency and counts

For each variant, the database stores:

These fields are shown in the variant pages and are available via the API for gene and region queries.

Population structure

The cohort is structured into 83 population groups, reflecting geographic, linguistic, and ethnic diversity in India. Frequencies can be aggregated overall or by population, enabling:

Population labels and sample sizes are described in the project publications and may be summarized in the browser or in downloadable metadata.

Database statistics

Summary statistics for the current release are available on the Stats page and give an overview of the resource scale.

Variant counts

Functional distribution

Rough distribution of variant consequences (e.g. intergenic, intronic, missense, synonymous, etc.) and exonic variant types (synonymous, missense, nonsense, frameshift, etc.) are provided to illustrate the composition of the dataset.

Exact numbers may be updated with new releases; refer to the Stats page and publication for current figures.

Using the browser

The GI-DB website allows you to search and explore variants by gene, variant, region, or rsID.

Search types

Query typeExampleDescription
GeneBRCA1Returns variants overlapping the gene (by symbol).
Variantchr7:117504290-C-TExact variant by chromosome, position, ref, alt.
Regionchr22:23727262-23777262All variants in the given genomic interval.
rsIDrs1000000Variant by dbSNP identifier.

Variant page

After searching, you can open a variant to see:

Data access and citation

API

Programmatic access is provided via the GI-DB API. Supported query types include:

Responses include variant identifiers, coordinates, allele frequencies, and annotations. See the API documentation for endpoints, parameters, and rate limits.

Citation

The flagship manuscript describing the Genome India cohort, pipeline, and variant catalogue will be published soon. Once available, please cite that publication when using GI-DB or Genome India data.

For the database and web resource, please acknowledge: GI-DB – Genome India Database (https://gidb.igib.res.in / maintained by CSIR-IGIB).