With GenomeIndia, population-scale genomics comes of age in India

In one way or another, all living people on the earth can trace their ancestry to Africa and therefore carry some portion of African ancestral DNA — some to a significant degree, others less so. If we all originated from the same African ancestors, why do we look so different?

Imagine two grains of paddy, one red and one blue. You plant them in two isolated locations, and each grows into a healthy rice plant, producing bags of red and blue rice, respectively. The next time, you sow a field with thousands of these red and blue grains. You no longer get just red and blue rice. You will begin to see a different coloured rice grain in addition to red and blue, and a spectrum of colors through successive generations.

This is because rice reproduces sexually: through genetic recombination during reproduction, offspring inherit not only the traits of their parent grains but also new, unique traits. Over time, spontaneous DNA changes — known as mutations — also introduce new characteristics, some beneficial and others detrimental. Both recombination and mutation result in genetic diversity.

Profound implications

Given an average human generation length of 25-30 years, around 8,000 to 10,000 generations have passed since modern humans emerged in Africa roughly 200,000-300,000 years ago. Evolution eliminates harmful genetic changes while allowing beneficial ones to persist, especially when people from different genetic backgrounds copulate. However, if a population remains isolated and intermarriage occurs only within the group over generations, harmful mutations may persist because they lack the opportunity to be diluted or eliminated.

To understand how humans originated and migrated across the globe, researchers examine both the DNA of present-day individuals and ancient DNA from archaeological remains. Comparing the two helps track how our genomes have changed over time.

But beyond tracing population history, studying present-day human DNA has profound implications for understanding human diseases, especially for developing diagnostics and treatments. India, in particular, offers a unique opportunity for such research due to its numerous endogamous populations: groups that have practiced marriage within their own communities for centuries. Several tribal groups have also remained genetically isolated for extended periods. Studying these populations can yield insights into both disease-causing and protective genetic traits.

For example, if a tribal group thrives at high altitudes, their DNA may contain beneficial markers that could help assess high-altitude readiness. Similarly, identifying mutations that predispose a specific population to a disease can lead to targeted drug development. Hence, sequencing the DNA of individuals from diverse, unrelated, and isolated populations is critical, and India is uniquely positioned for this research.

Considerable effort

Recognising this, the Government of India launched the GenomeIndia National Consortiumin 2017 to sequence the DNA of nearly 10,000 unrelated individuals. Some findings were reported this week in a peer-reviewed journal. Researchers selected individuals from 83 population groups — including 30 isolated tribal groups — and collected blood samples from 75 to 160 individuals per group. They extracted DNA and sequenced it using next-generation technologies.

The data underwent quality control, was aggregated across institutions, and used to identify genetic variants: parts of the DNA that differ from the reference genome. The study identified over 180 million genetic variants, many of which are not found in existing global databases and which are largely based on individuals of Caucasian descent.

This isn’t the first study of genetic diversity in India. Its detailed results are also still pending: a summary of the experimental design and results were published as a ‘Comment’ article in Nature Geneticson April 8; a full scientific paper detailing the research results is still awaited.

This said, the GenomeIndia consortium represents the largest and most comprehensive genomics effort in the country. It marks an important milestone, enabling disease research within populations where both harmful and protective mutations may be more prevalent than elsewhere.

Only the beginning

Then again, this is also only the beginning. The true value will emerge when other researchers use the consortium’s data to identify disease-causing variants, develop diagnostic tools, test drugs, and build AI models to better understand human biology. For this to happen, the results must be validated by other researchers and the data shared openly in line with international genomic data-sharing standards, while ensuring the privacy of individuals and populations.

Encouragingly, the Department of Biotechnology has invited proposals to fund research based on the data generated by the consortium. However, while it’s essential to protect individual and population identities, and their geographic details, the decision to withhold FASTQ files (the raw sequencing data) is a step backward. Science advances through open data sharing.

Indian researchers, including those in the GenomeIndia project, have greatly benefited from open-access datasets from large international initiatives and individual laboratories, especially in the U.S. and Europe. The U.K. Biobank, for instance, has set a benchmark for open data-sharing by providing access to health and whole-genome data from half a million individuals. The GenomeIndia Consortium should follow this example.

Making the raw data publicly available will empower scientists globally and accelerate the next wave of discoveries.

Binay Panda is a professor at the Jawaharlal Nehru University, New Delhi.