
"All models are wrong, but some are useful," is a basic operational principle of population genetics. The aphorism is attributed to George Box, who cited the ideal gas law as an example, but it crops up in every attempt we make to relate on-the-ground biology and ecology to observed patterns of diversity in DNA sequences.
My first exposure to this issue was probably reading Whitlock and McCauley’s 1999 review of the tricky relationship between pairwise genetic differentiation and actual migration rates. Classic theory by none other than Sewall Wright related the differentiation index FST to the effective migration rate as FST ≈ 1/(4Nm+1) — but for that relationship to hold, every population of a sampled species needed to be the same effective size, and individuals needed to move between all pairs of populations at exactly the same rate. It is, to put it mildly, rather unlikely that any naturally distributed species might meet those conditions.
In the years since Whitlock and McCauley (1999), population genetics has accumulated a wealth of methods to provide more realistic models of population structure. But most of these still rely on treating populations as discrete patches linked by a web of migration — and while there are taxa that really are distributed in patches, environmental variation in the real world is much less tidy. A new paper published online ahead of print in Genetics takes a dive into the effects of modeling patchy distributions when your data comes from a continuously distributed world, and it suggests that this particular wrong model may be less useful than we’ve assumed.
C.J. Battey, Peter Ralph, and Andrew Kern, the paper’s coauthors, identify the main problem sampling from a continuously distributed landscape is correlations among samples: Analyses that assume discrete, patchy populations also typically assume that each patch is well-mixed, and that sampling within patches isn’t affected by spatial correlations or nonrandom mating. This kind of correlation among sampled individuals is a big hazard for analyses testing for associations between genotypes and phenotypes or environments — GWAS or GEAs. A lot of thought has been given to statistical methods that control for it — but those methods often start from a model of patchy populations.
To examine how autocorrelation within (assumed) patches might impact a lot of population genetic analyses, Battey et al. simulated populations of individuals distributed in continuous space, using SLiM v3.1. Simulated individuals dispersed and selected mates randomly across the landscape, but with probabilities defined by a Gaussian spatial kernel — that is they were less likely to disperse to a location or select a mate with increasing distance from their current position. For comparison, they simulated populations of similar size with truly random mating among all individuals.
To test how continuous spatial structure might confound a GWAS, the coauthors simulated nongenetic phenotype values for all individuals, which were either assigned independent of individuals’ spatial position or according to three different schemes in which individuals’ phenotypes were determined by their locations on the landscape. Most phenotype values of interest are affected by spatial environmental variation in this way — one of the fundamentals of genetic experimental design is dealing with this problem, but it’s also a factor that is impossible to fully account for in GWAS with human data. From the simulated populations, they sampled individuals to conduct actual analysis — distributing their sample across the landscape, concentrating sampling in a single central point, or sampling from four equidistant points.

As is probably not surprising, space makes a difference in almost every aspect of population genetic analysis Battey et al. consider. In spatial simulations with small “neighborhood size” — that is, when simulated individuals were much more likely to find mates close to them, and their offspring were much more likely to stay near their parents, most standard population genetic summary statistics deviated from what was seen in the random mating comparison standard, across all three sampling strategies. As one pointed example, Tajima’s D and the related θπ statistic, measures of genetic diversity that we use to infer the activity of natural selection, are both inflated by strong spatial structuring in a continuously distributed landscape.

The picture isn’t much better for GWAS. Battey et al. find that statistically significant genotype-phenotype correlations are pervasive under any of their approaches to simulating environmentally-induced phenotypes. (And, indeed, because the phenotype in their GWAS testing is always induced by spatially structured environmental conditions, any significant association is a false positive.) Correcting for population structure by using principal components of the genotype matrix as a covariate — the most common approach to try to account for this issue in GWAS — reduces but does not eliminate false positive associations at small neighborhood sizes, where as many as 1% of simulated SNPs may still show associations. That 1% false positive rate sounds survivable until you consider that GWAS typically involves thousands or millions of tested markers.
What can working geneticists do to cope with this? Battey et al. have some concrete recommendations. For more accurate descriptions of genetic diversity and structure, they say to sample broadly, and at varying spatial scales. Covering multiple spatial scales would allow for comparisons between summary statistics calculated range-wide versus on local-level samples, and differences between those scales would indicate possible issues with poor dispersal across a continuous landscape.
For GWAS, there’s good news and bad news. In some sense the consideration only of false positives is a conservative test — loci that were truly, causally associated with the phenotype might be so much more strongly associated than the false positives that they would reliably show up as genome-wide outliers and the false positives wouldn’t mask them. But maybe not! That would depend on the strength of allelic effects, among other factors that are typically unknowns to be estimated in the context of a real GWAS project. In the case of the increasingly popular study design in which effect estimates from many weakly associated loci are summed into a polygenic score, the results of these simulations are particularly worrying: the possibility is that polygenic scores could be calculated with a lot of false positives in the mix. Battey et al. suggest that a standard approach could be to test for allelic associations with samples’ locations of origin. Finding significant associations with latitude and longitude would be a red flag for interpretation of associations with actual phenotypes of interest.
For those of us who work with species that allow experimental designs that can directly control for environmental effects, this paper is added motivation to triangulate among multiple lines of evidence when we’re deciding what loci most likely contribute to a trait. For studies on long-lived, hard-to-experiment-upon taxa — and humans, the trickiest model organism of all — it’s one more reason to treat results with a high degree of caution.
References
Battey CJ, PL Ralph and AD Kern. 2020. Space is the place: Effects of continuous spatial structure on analysis of population genetic data. Genetics doi: 10.1534/genetics.120.303143
Haworth S, R Mitchell, L Corbin, et al. 2019. Apparent latent structure within the UK Biobank sample has implications for epidemiological analysis. Nature Communications 10, 333. doi: 10.1038/s41467-018-08219-1
Whitlock MC and DE McCauley. 1999. Indirect measures of gene flow and migration: FST ≠ 1/(4Nm+1). Heredity 82: 117-125. doi: 10.1046/j.1365-2540.1999.00496.x
