top of page

Comparative genomics of flowering plants

Plant genomes are known for their incredible diversity and complexity, even among eukaryotes, which has been challenging their assembly for decades. 

In the era of long-read sequencing technologies, however, plant genomic resources are richer than ever, with multiple high-quality, chromosome-resolved genome assemblies published every year. 

Taking advantage of this, we performed comparative analysis of 30 different eudicot genomes. 

Genomic Structural Variation (SV)
Screen Shot 2022-05-11 at 3.06.19 PM.png

The raw material for evolution is genomic variation. Genomic variation can be found at micro-scale such as individual nucleotide mutation (substitution, insertion, deletion, single nucleotide polymorphism or SNPs), as well as more macro-scale variations. These variations that span a larger region in the genome are referred to as structural variations or SVs. Examples include: 

  • Inversion 

  • Duplication

  • Translocation 

They can occur at any sizes from small (a gene) to large segments (whole genome like polyploid). 

Screen Shot 2022-05-11 at 3.05.47 PM.png
Screen Shot 2022-05-11 at 3.15.48 PM.png
Role of inversions in adaptation and speciation
DSC01335.jpg
DSC01325.jpg
IMG_2846.jpg

Inversions are particularly interesting because of their potential roles involved in adaptation. They suppress recombination in heterozygous individuals as chromosomes don't align properly during meiosis. The mechanisms of how inversions contribute to the adaptation or speciation process have been historically debated. 

​

It is a good reminder to say that a genomic mutant of any type which results in phenotypic change is mostly deleterious in nature. What this means is that many SVs that might have occurred in the past do not exist in the observed population today because they died off in a single generation. Meanwhile, there are cases where inversions happened to be persisted so that they are observed in the present-day genomes. The local adaptation hypothesis has gained the most empirical support to explain this phenomenon. Genomic inversions spanning multiple genes (sometimes hundreds) have been observed and shown to be linked with the adaptation of species/populations to the local environment across many different organisms (e.g., sunflower, monkeyflower, invasive crab, white throated sparrow, ruff).

What about inversions that do not contribute to adaptation? 

While adaptive inversions gain a lot of attention, comparatively little is known about how inversions behave at neutral. For any evolutionary implications to make sense, one must show a significant variance from neutral expectations. Not all inversions are adaptive, nor need to be, in order to be fixed. 

 

There are well-established theories on inversions and their molecular mechanisms, but empirical support is scarce. The important questions that we tried to address in this project are:

  • How often do inversions occur in nature?

  • How fast do inversions fix in a population by drift or random chance?

  • What molecular mechanisms drive inversions?

  • What are the potential consequences of neutral inversions? 

We compared 32 paired genera, 64 species in eudicots: 
inv syn 32 species pair sequence divergence.png

Nucleotide sequence identity

Density of sequence identity score distribution

Most species show a similar distribution of sequence divergence in inversions compared to syntenic regions. Except, a few species pair like Salvia show a different shape (distribution) and peak position (mean sequence divergence) between inversion and syntenic regions. This may indicate a different evolutionary history in inversions which could be interesting to follow up. 

Do inversions occur/accumulate transposable elements or coding sequences? 

With the idea that structural variants like inversions are mostly deleterious, we expect inversions to occur at regions where they aren't functionally important (i.e., coding sequences or CDS). 

With the idea that mobile, repetitive elements like transposable elements (TEs) tend to accumulate at regions of low recombination and/or promote structural rearrangements, we expect inversions to occur at TE enriched regions. 

We observed that this is generally true: 

Screen Shot 2022-06-19 at 10.42.25 AM.png
Screen Shot 2022-06-19 at 10.42.42 AM.png
Screen Shot 2022-06-19 at 10.43.47 AM.png
Screen Shot 2022-06-19 at 10.43.17 AM.png

But whether

1. inversions tend to occur at regions of high TEs and low CDS 

 OR

2. inversions that can be fixed by chance tend to accumulate high TEs and low CDS 

needs further testing and different research design, as our data cannot tell these apart. It's the chicken or egg dilemma...

interested in this comparative genomics pipeline?

check out my Github!

bottom of page