
Genotype-environment association is one of the most fundamental phenomena of landscape genomics. A species’ adaptation to its environment should mean that populations of the same species in different environments will evolve different frequencies of genetic variants that support adaptation to those different environments. So in principle, we should be able to find those locally adaptive genetic variants by “scanning” through many places in the genome to find the ones where individuals from different environments carry different variants.
There’s a catch, of course. Actually several. Isolation-by-distance and founder effects can mean that populations evolve genetic differences just as a consequence of being in different places. Even when you control for those effects, so-called genotype-environment association (GEA) is just that — an association. You can’t know for sure that a genetic locus showing a pattern of GEA has a functional relationship to traits that facilitate adaptation without independent data showing that function. A recent preprint reports a project in which plant geneticists did just that, performing experimental validation on GEA candidate loci, and they found a lot of those candidate loci don’t seem to hold up.
To demonstrate this, Yuxin Luo and coauthors in Jesse Lasky’s lab at Penn State University started by assembling a list of GEA candidates found in three prior studies of Arabidopsis thaliana. Arabidopsis is the lab rat of plants, a little weedy member of the mustard family that’s quick to grow in a greenhouse or a climate-controlled growth chamber, with a small diploid genome that has been extensively studied and annotated. It’s become something of a “field model” organism as multiple teams have taken insights and resources from lab and greenhouse experiments to studies of wild Arabidopsis populations.
Wild Arabidopsis populations are distributed from the Mediterranean to Siberia, and many studies have described local adaptation to that wide range of environmental variation. The three studies selected by Luo et al. used different statistical methods to test single nucleotide polymorphisms for associations with climate variation — mostly precipitation variation, but also temperature. We’d expect SNPs with such associations to lie in genes that shape drought stress tolerance or avoidance — by, for instance, flowering before the onset of a hot, dry summer — and freezing resistance. The definitive evidence for those roles would be an experiment showing that plants carrying different variants of a gene near a climate-associated SNP have different growth or reproductive success depending on the environment they experience — a genotype-by-environment interaction.
From the climate-associated SNPs identified in the three prior studies, Luo et al. picked 42 genes to test. They tested those loci for genotype-by-environment interactions by drawing on the Salk collection, a library of Arabidopsis lines derived from a single inbred ancestor, in which individual genes have been mutated into non-functionality by treatment with bacterial transfer-DNA. These “knockout lines” should be genetically identical to each other except for the one mutation unique to each line, so differences in how plants grown from each line respond to an experimental environment are due to each plant line’s mutated gene. That’s the kind of powerful experimental genetic resource you get to use, when you work with Arabidopsis.
Luo et al. put that resource to use by growing mutant lines carrying each of their GEA candidate loci with either sufficient water, or under drought-stress conditions. To the extent that GEA results reflect local adaptation to natural variation in water availability, plants with mutant variants at the GEA candidate loci should have different fitness than plants from the un-mutated ancestral line — and the magnitude or even direction of those differences should change in the drought-stress treatment, a genotype-by-environment interaction.
For the most part, this is not what they found. In the drought-stress treatment, all of the plant genotypes had reduced fitness, measured as aboveground biomass, flowering time, flower and fruit production, or fruit size – but these reductions occurred in similar ways for plants carrying mutations at most of the GEA candidate loci. Only two loci showed genotype-by-environment interactions in fitness measures.

On the one hand, that’s a lot of GEA candidates that don’t seem to pan out. However, as the authors point out, it may be that some candidate loci that don’t show significant genotype-by-environment interactions in their experiment still have functional effects that aren’t captured by their experimental setup. The drought treatment may not reflect natural environmental variation that drives local adaptation in real populations.
And, second, the simplicity that makes the Salk library of mutations so useful for experimentation is itself a limitation, in a sense. In natural populations, variants at candidate loci may be adapted in the context of variation elsewhere in the genome. The Salk lines are engineered to carry individual mutations on otherwise identical “genetic backgrounds”, to isolate the effects of the mutations. So if a GEA candidate has an effect that varies across genetic backgrounds, it might not show the expected interaction with environment in the one background shared by the Salk lines. All of which is to say, traits created by multiple loci are challenging to work with!
So maybe unambiguous experimental validation for two out of 42 GEA candidates is actually pretty good, all things considered. Luo et al. also call out an observation that may be useful for those of us thinking about followup from an initial GEA “scan” — both of their validated loci were the top-ranked candidates, showing the strongest GEA signal, of all loci tested in their respective source studies. They suggest identifying candidates by rank — like taking the top 1%, or 0.01% of tested loci — may give better results for follow-up work than selecting them based on an a priori threshold of statistical significance — like taking all loci with GEA at p < 0.05. That’s a nice specific recommendation for future studies in species where it’s not so straightforward to do this kind of follow-up validation.
References
Joost S, A Bonin, MW Bruford, L Després, C Conord, G Erhardt, and P Taberlet. 2007. A spatial analysis method (SAM) to detect candidate loci for selection: towards a landscape genomics approach to adaptation. Molecular Ecology. 16(18):3955-69. doi.org/10.1111/j.1365-294X.2007.03442.x
Luo Y, C Lorts, E Lawrence-Paul, and J Lasky. 2025. Experimental validation of genome-environment associations in Arabidopsis. biorXiv. doi.org/10.1101/2025.01.08.631904
O’Malley RC, Barragan CC, Ecker JR. A user’s guide to the Arabidopsis T-DNA insertion mutant collections. Methods Mol Biol. 2015;1284:323-42. doi.org/10.1007/978-1-4939-2444-8_16; PMID 25757780