Arun Sethuraman is a postdoctoral associate with Jody Hey, studying statistical models for divergence population genetics in the Department of Biology at Temple University. You can also find him on Twitter, and on his short story blog.
After nearly six years of researching population genetic structure as a bioinformatician, visualizing it as a pretty palette of colored subpopulations, with traces of leaky color wells that indicate genetic admixture, I was rather taken aback when a field biologist in the midst of my 8 AM wintry Iowa morning ‘crowd’ at my doctoral defense asked: “Okay so there are K = 5 subpopulations. So what?”
I took what seemed like a sacrilegious hour (when it was really just a few seconds) to mull this over, as proverbial moments of my graduate life flashed before my eyes. I conjured up a few trained lines about why population structure is important with respect to localized accumulations of alleles and inbreeding, possibly detrimental (see Wright’s seminal work on “The Genetical Structure of Populations”, 1949 for a wonderful account of why biologists should care about the presence of population structure). While it was sufficient to pull me through my defense (I passed with flying colors like my many admixture plots), I spent several Starbucks happy hours since, wondering about why we truly care about structure and classification.
And then it occurred to me that our quest for inferring population structure is no different from the classical species problem, with blurry crisscrossed lines of thought along the philosophical and biological. It suffers the same triumphs when patterns (of differentiation or admixture) and processes (causal or causing) coincide, and the same pitfalls when not (obviously) biologically significant or important.
A recent rather scandalous revelation of the mixed South Asian ancestry of the Crown Prince of England, William, or evidences of human-neanderthal interbreeding , or a much (and rather steeped with controversy and years of historical racial segregation, division, war, and crime) deeper question of purported Aryan/European ‘superiority’ over Dravidian/South Asian that was upturned through a SNP study (Metspalu et al, 2011, AJHJ), are all examples of the former situation – wherein patterns were utilized to either support or debunk processes. But then again, as reported by Gilbert et al. 2012, from a sample of 23 papers published in Molecular Ecology in 2011 (and other datasets), ~30% of datasets were either inconclusive or irreproducible in their analyses of population structure – a major issue that falls in our latter category of biological insignificance or unimportance, when contradicting inference leads to inability to concur.
A lot of the problem, I think, has to do with the nature of the population process itself – adopting a single definition of which has been very questionable and wrought with conflicting opinions. As a biologist, though, one quests for a simple, yet all encompassing definition of an ideal population, which I attempt here: “A population is a continuum of freely mating individuals of the same species that are statistically less distinguishable within a subpopulation, but differentiable between subpopulations”, with due respect to Wright (1949), Dobzhansky (1970), Lawson (2012), etc.
Perhaps understanding that populations are indeed continuums solves part of our problem – maybe there are no lines to draw and classify, or maybe the lines are correlated with barriers (geographical, linguistic, logistical, social, racial, etc.). But our quest for classification persists, and perhaps understanding how an ideal population would be classified would help us understand how ubiquitous (over the tree of life) non-ideal populations evolve.