After more than two decades, human genome project Celebrated as a scientific milestone, scientists have finally finished the job. The first complete, gap-free sequencing of the human genome has been published in a breakthrough expected to pave the way for new insights into health and what makes our species unique.
Dr Karen Mega, a scientist at the University of California, Santa Cruz who co-led the international consortium behind the project, said: “These parts of the human genome that we have not been able to study for more than 20 years are important to our understanding of how genomes work, genetic disease, human diversity and evolution.”
To date, about 8% of the human genome has been missing, including large stretches of highly repetitive sequences, sometimes described as ‘junk DNA’. Although, in fact, these repetitive sections were omitted due to technical difficulties in their sequential order, rather than simply a lack of interest.
Genome sequencing is something like breaking a book into pieces of text and then trying to rebuild the book by putting them back together. Extensions of text that contain a lot of common or repetitive words and phrases can be more difficult to put into place than unique parts of the text. New “long-read” sequencing technologies that decode large pieces of DNA at once – enough to capture many repeats – have helped overcome this hurdle.
Scientists were able to simplify the puzzle further by using an unusual cell type that contains only DNA inherited from the father (most cells in the body contain two genomes – one from each parent). Together, these two advances have allowed for the decoding of the more than 3 billion letters that make up the human genome.
“In the future, when a person’s genome is sequenced, we will be able to identify all the variants in their DNA and use that information to better guide healthcare,” said Dr. Adam Philippi, of the National Human Genome Research Institute in Maryland and co-chair of the coalition. “Really completing the sequencing of the human genome was like putting on a new pair of glasses. Now that we can see everything clearly, we are one step closer to understanding what it all means.”
One area of interest is that the parts of the genome that contain many repetitive stretches include those in which there is most human genetic variation. Variation within these regions may also provide important clues to how our human ancestors underwent rapid evolutionary changes that led to more complex cognition.
The work is also likely to lead to a better understanding of the mysterious components of the genome known as centromeres. They are dense bundles of DNA that bind chromosomes together and play a role in cell division, but until now they were considered unplanned because they contain thousands of stretches of DNA sequences that repeat over and over again.
The science behind the sequencing efforts and some preliminary analyzes of new genome regions are outlined in six research papers published in the journal to know.
“When we unlock these new parts of the genome, we think there will be genetic diversity that contributes to many different traits and disease risks,” said Rajiv McCoy, of Johns Hopkins University and a co-author of the telomere-to-telomere (T2T) consortium. “There’s an aspect of this that’s like, ‘We don’t know yet what we don’t know.'”