Big sequencing efforts have gone a long way to help understand the complexities of polyploidy. However, the bioinformatic approaches to sorting and scoring alleles in next-gen data are generally designed for easy of use in diploid species.
Unlike a diploid species, where paralogous loci are often a problem that needs filtered out, these loci are partially what makes a polyploid so interesting. What is going on with all of those duplicated genes?
An undervalued aspect of identifying duplicated loci in polyploids is the phenomenon of where they end up on a chromosome. Re-diploidization, when a lineage that undergoes whole genome duplication reverts back to diploidy over time, can result in gene duplications being isolated at the distal ends of homeologs*. Importantly, these distal ends are known for their high concentration of gene families valued for their adaptive potential.
If you are filtering out seemingly duplicate reads from a polyploid, you could be missing out on these isolated paralogs. What do you do?
Well, if your study system is a plant, you probably already stopped reading this. The reproductive habits of plants have allowed for the capture of double haploids and other backcrosses or inbred lines that serve as the basis for mapping plant genomes. Getting a haploid perspective for a plant lineage can be as easy as just grabbing a haploid part of the plant.
Those haploids are awfully helpful because they provide a baseline dosage that can be expected in a polyploid individual. Is there a vertebrate alternative? A new paper by Limborg et al. suggests creating gynogenetic haploids and diploids by either irradiating sperm/ova and then blocking cell division, which is a (relatively) straightforward approach that works especially well in taxa with external fertilization.
The resulting gynogenetic haploids can provide clear evidence for duplications:
Haploids only contain a single allele at diploid loci, so the presence of two or more different alleles at a locus signal a duplication where alleles from both duplicates have been assembled into a single locus due to retained sequence similarity.
The resulting gynogenetic diploids can be used specifically for mapping purposes. Because they contain chromatids from meiosis II, recombination frequencies can estimate the distance from a gene to the centromere.
Together, these lines could be used to properly genotype polyploids that have undergone genome duplication events and further the ability to ask questions about their evolutionary history, adaptive potential, or ecological relationships with related diploids.
*The confusion of sorting out polyploid homeologs is probably what gave rise to the American Genetics Association workshop that spawned this paper, titled “Escape from Homelog Hell”, which is about the greatest thing ever.
Cited
Limborg, M. T., Seeb, L. W., & Seeb, J. E. (2016). Sorting duplicated loci disentangles complexities of polyploid genomes masked by genotyping by sequencing. Molecular Ecology.