It’s a new year, and while many of the challenges of 2020 and 2021 don’t show any sign of letting up, I’m trying to pick up some habits that fell by the wayside while I juggled fully online semesters and social distancing via fieldwork. One of those is good old fashioned blogging. Writing briefly about the papers I’m reading and the scientific concepts I’m mulling has been part of my scholarly process since graduate school. Over two years of the COVID-19 pandemic I’ve alternated between feeling like I had no spare time for blogging and feeling like I had no capacity for blogging — but I’ve felt the absence of the practice.
So, here’s one starting point: I’m going to begin posting roundups of the preprints and papers I’ve recently read, with brief reactions. Expect these every other week or so, with adjustments in frequency depending on my personal bandwidth. A paper’s inclusion in these posts will not constituted any kind of endorsement other than “this caught my attention long enough to download the PDF”, as perhaps this first batch will demonstrate.
Without further ado, here’s what I’ve read recently:
Álvarez-Carretero, S. et al. 2022. A species-level timeline of mammal evolution integrating phylogenomic data. Nature. doi: 10.1038/s41586-021-04341-1
Reports a time-calibrated phylogeny of 4,705 mammal species using information from 72 mammalian reference genomes, which allows unprecedented resolution of the timing of key events in the history of mammals. Highlighted by the authors:
- The common ancestor of placental mammals clearly predates the end-Cretaceous extinction of non-avian dinosaurs;
- Most modern mammal placental orders (except the Afrotheria) have their common ancestry dating to the Paleogene, after the dinosaurs were off the scene
- Using more loci for divergence time estimation results in narrower confidence intervals (no kidding!)
Johnson SE et al. 2022. Rapid, parallel evolution of field mustard (Brassica rapa) under experimental drought. Evolution. doi: 10.1111/evo.14413
Experimental evolution with wild radish, almost following the protocols used for bacteria — 24 replicate populations started from a single source population, evolved for four generations with or without seasonal drought, with seeds banked at each generation so that ancestral populations could be “resurrected” for comparison to the evolved fourth-generation plants. The authors find the populations evolved under the drought regime evolved earlier (drought-avoiding) flowering time, greater leaf area, and higher water use efficiency under drought, largely in parallel, and they perform a formal test for the degree of parallel evolution across replicates following the thinking of Bolnick et al. (2018).
Rockweiler et al. 2021. The origins and functional effects of postzygotic mutations throughout the human lifespan. bioRxiv. doi:10.1101/2021.12.20.473199
Reconstruction and analysis of post-zygotic (somatic) mutations in different tissues from 948 human donors, including differentiation of pre- and post-natal mutations. Prenatal mutations are understood to be those that are found in multiple tissue types, because they must date to the developmental common ancestry of those tissue types.
- The highest median mutation accumulation is in liver, with sun-exposed skin (the lower leg) not far behind; lowest is seen in nervous system and reproductive tissues.
- Mutation accumulation in sun-exposed skin is lower for African American donors. Also, “unexpectedly”, lower mutation accumulation in skin for males than females.
- Most of the mutations that could be identified as prenatal, based on what sets of tissue shared them, trace to early embryogenesis — before gastrulation and in the differentiation of neural ectoderm.
Lou et al. 2021. A beginner’s guide to low-coverage whole-genome sequencing for population genomics. Molecular Ecology. doi: 10.1111/mec.16077
- Because much of population genetics is based on popualation characteristics, individual genotypes need not be resolved in many cases.
- Indeed, population-level estimates based on many low-coverage individuals can be better than those based on a few high-coverage individuals, if uncertainty is accounted for in the estimation — simulation work suggests maximum power for sequencing volume is at 1-2x coverage per locus.
- Of course, any WGS needs a reference for mapping; though you could use a de novo transcriptome as a less challenging first step — or long read tech is getting to the point that a de novo assembly isn’t so far out of reach any more. [This last, I think, is getting pretty optimistic.]
- One key consideration with calculating genotype likelihoods from low-coverage data is that identification of rare variants becomes more method-dependent; working with low coverage data will often mean setting higher thresholds for read quality, minimum depth of coverage, and variant frequency.
- Simulation work to explore project design tradeoffs; shows (slight) improvement in the accuracy of allele frequency estimation from low coverage WGS over equivalent pooled sequencing.
- Discusses a lot of the workflow from the perspecitve of ANGSD, a workhorse package capable of working from raw sequences or genotype likelihoods for everything from SNP calling to GWAS; but also lists alternatives for many analysis use-cases.
- Overall a very thorough introduction to the method and its considerations.