Everything is meta these days – metabarcoding, metagenomics, and now meta blog posts that are reviews of reviews. Much like every ecologist at least dabbles in the molecular world, so most of those predisposed to molecular ecology and population genetics are at least dabbling in (or teaching or reviewing) studies with an environmental DNA (eDNA) component. The number of metabarcoding, metagenomics, and/or eDNA studies has dramatically increased in recent years and if you find yourself dabbling, or at the precipice of designing experiments, you probably need some all-encompassing reference to ground you. Fortunately, Denier et al (2017) put together a very helpful review for those of us drowning in eDNA and metabarcoding literature. They separate themselves from other reviews by focusing on four aspects: summary of eDNA studies focused on plants and animals , what’s known and unknown about the spatial and temporal scales of eDNA info, guidelines and challenges regarding experimental design, and emerging applications.
What are the advantages of these types of studies?
The authors posit that with the explosion of high throughput sequencing (HTS), , the way we survey biodiversity has subsequently vastly changed. Being able to associate a taxonomic identity with a DNA barcode has led to the eDNA metabarcoding revolution. This technique of surveying biodiversity has obvious advantages, including the ability to survey entire communities that have been previously excluded due to the size of organisms and/or elusiveness, thereby increasing diversity measurements and increasing the resolution of taxonomic identifications and subsequent databases. The potential increase in scope of metabarcoding studies also allows for applications on the community and ecosystem scale, like determining whether “observed community changes surpass acceptable thresholds for certain desired ecosystem functions” and guide resource management at ecosystem scales. Furthermore, the taxonomic scope that can be sampled is positively grandiose, with one study using metabarcoding techniques spanning 5 different genomic regions to survey three domains of life in topsoil (Drummond et al 2015).
What types of studies have been done?
The authors emphasize the distinction between community DNA and eDNA. Community DNA studies target groups collected in bulk, then separate organisms from debris, pool them together, and extract the DNA in bulk. Sequences from community DNA extraction can be traced back to the source organism and taxonomically verified and Sanger sequences of voucher specimens can lead to direct verification of species. This is untenable for eDNA sampling so species verification relies upon curated databases like Genbank, SILVA, and the Barcode of Life Data System (BOLD). With community DNA the presence of a detected species in that time and that place can be inferred, but with eDNA, the presence of that species’ DNA may not necessarily mean that species was directly present at the time or place of sampling. For example, does the DNA you collect at river sites represent what’s present in the here and now, or what’s present upstream? Nevertheless, there have been a myriad of elegant studies done in freshwater, marine and terrestrial/aerial regimes. Examples in the review include early detection of invasive populations, the use of terrestrial haematophagous leeches to collect DNA from their endangered/elusive vertebrate host species in geographically remote regions, filtered air samples to collect pollen, and collections of spider webs, pollen from honey, and feces from generalist predators to estimate biodiversity of hard-to-capture taxa.
eDNA work can provide a glimpse into the ecological past as well. Whereas sampling surface water in freshwater systems provides contemporaneous abundance estimates, sediment cores from those same systems tell you about present and past biodiversity. Lake sediment cores have been used to look at ancient biodiversity levels from 6 – 12.6 thousand years before present. Sediment from ice cores have been used to look at species abundances 2000 years before present and to track previous extinctions associated with glacial events. Ever wonder how in the world DNA is preserved in sediment for long periods of time? The authors explain that adsorption of nucleotides onto sediment particles shields them from degradation – especially oxidation and hydrolysis. In fact, marine sediment eDNA concentrations have been shown to be 3 orders of magnitude higher than seawater eDNA (Torti et al 2015). So the next time you are having trouble getting high quality DNA from your extractions, rub some dirt on them.
Is there evidence that sequence read abundance correlates with taxa abundances?
It seems like an obvious question, but this is really the heart of the matter. Sure, it seems obvious that eDNA techniques are very useful in situations where you are looking for presence/absence of species that are hard to survey with conventional methods, but the Holy Grail is inferring abundances of species from eDNA collections. The authors cite many examples of studies in freshwater and marine aquaria and mesocosms where eDNA was successfully used to measure relative population abundance with species-specific primers and qPCR. However, studies scaling this up to community level are rare. Table 1 from the review offers many examples of studies comparing richness estimates with traditional sampling or historical data. In every case they cite, the eDNA study produced similar or higher diversity estimates. In one example I find particularly interesting, Thomsen et al (2016) has shown a correlation between relative abundance of individuals and biomass from deep-sea fish trawls when sequence reads are pooled to family level. If these guys can get results from deep-sea trawls, there’s really no excuse for the rest of us, is there? Except there are plenty of excuses (see challenges below). Ultimately, it is up to the “ecology of the DNA” (Barnes and Turner 2016) i.e. its state, origin, fate, and transport, which can vary greatly between studies.
What are the challenges?
Figure 3 from Denier et al (2017) nicely summarizes the workflow of a typical metabarcoding study. It also illustrates the myriad challenges that can arise in the study design, field collections, laboratory, and data processing. It behooves any scientist preparing a study employing these methods to think carefully about the questions posed in each of these categories. Furthermore, though discussions of challenges in metabarcoding studies tend toward the technical, the authors stress that before those concerns come into play, it is imperative to be clear about the source material (community DNA vs eDNA). Otherwise, this complicates the downstream analysis pipelines and subsequent interpretation of biodiversity patterns through space and time. Another important consideration is subsampling during processing steps (see Figure 2 in the review for a demonstration), which will likely result in the loss of rare sequence reads.
Figure 3 from Denier et al 2017
The rise of HTS has led to the feasibility of multiplexing large numbers of samples. However, it also creates the possibility for errors and biases, such as tag jumping, whereby indexes/adapters (oligos attached to sequences from different samples used to identify each sample uniquely), become associated with sequences from another sample. Schnell et al (2015) found this to occur in roughly 2.5% of cases, where sequences had false tag combinations that led to erroneous assignments of sequences to samples. This phenomenon also seems to occur at higher frequency when using the HiSeq 4000 platform. HTS can also give rise to technical artifacts, like finding significant differences due to samples run on different machines or different days (batch effects), instead of biologically meaningful differences. Splitting sample groups across platforms/runs is one way to minimize such effects. Mismatches between primers and the DNA of certain taxonomic groups (primer bias) are another common challenge. This results in some taxa being preferentially sequenced over others or absent altogether from downstream biodiversity estimates. Therefore, when designing new primers, testing in silico, in vitro, and in situ is imperative. Also, HTS sequencing favors the amplification of smaller products so eliminating excess indexing primers from reactions via purification steps after QC checks is a must.
Lack of taxonomic resolution can occur when the discriminatory power of the primers is weak. For example, many animals have been barcoded via the cytochrome oxidase (COI) gene using conventional sequencing methods, yields a gene fragment of ~400-600 bp, depending on the primers you use. If you want to barcode a prokaryote using cloning techniques, you’ll get the full 16S gene at about 1450bp. However, metabarcoding techniques using HTS require much shorter fragments, which lowers the marker’s capability to discern between taxonomic groups. Also, there are more bioinformatics pipelines and curated databases available for bacteria and microbial eukaryotes than for macro-organisms. Although pipelines developed for microbes can be repurposed to preprocess data from macro-eukaryotes, the lack of comprehensive databases can prove challenging. However, an advantage to targeting megafaunal communities is that typically there will be less diversity than in microbial communities so less computational time/effort. Also, species boundaries tend to be better defined. Huzzah!
Are there standards of practice?
Kinda. There are standard barcoding markers defined by the Consortium for the Barcode of Life (CBOL). If you want to compare your results to other studies, you need to use these standards. COI is the most common barcoding gene for many animal taxa, though with many exceptions. Popular alternatives are 12S, 18S, 16S and/or cytB. A combination of rbcL and matK plastid loci or ITS2 are the standards for most plant taxa and 16S, spanning variable regions V3 and V4, is the most commonly used for prokaryotes. The Catch-22 is that deviating from the standards is the key for picking up previously unsurveyed taxa but also makes comparisons with curated databases difficult.
Quality control methods in the lab are of the upmost importance and may be higher stringency than what researchers using more conventional methods are used to. For example, employing negative controls, not just at the PCR stage, but at each stage of lab work, and sequencing them. Often contamination can be below detection limits but can be used to detect de-mulitplexing errors or used in statistical modeling to rule out false positive detection. Constructing mock communities from pooled DNA extracts as positive controls alongside that of eDNA samples is good practice for standardization and comparison. Typically, species not expected in the study area are used so that contamination can be detected.
Data analyses in these studies require a strong commitment to transparency, which may be daunting due to the new methodologies, amount of data generated, etc. The authors mention several important references that have addressed standards of practice, like MIMARKS (minimum info about marker gene sequence, Yilmaz et al 2011)) and MIxS (minimum info about any “x” sequence). Goldberg et al 2016 contains a thorough breakdown of recommendations of reporting standards and challenges specific to eDNA studies in aquatic environments. In addition, Sandve et al 2013 (not mentioned in the review) propose 10 rules for reproducible computational research in general, which should be applied to eDNA studies. These include keeping track of how every result was produced, the version of every program used, recording all intermediate results, avoiding manual manipulation of data, version control of all custom scripts, and storing all raw data used to generate plots. Some of these outputs can be entered into data repositories, like DRYAD, GITHUB, or FIGSHARE. The compliance of studies to these standards is variable, however, especially since this type of research can be published in many different types of journals, each with their own standards and requirements. The authors strongly recommend increasing transparency in published articles, though this can fly in the face of Reviewer 2, who wants you to cut the length of your manuscript by a third. Much like this blog post.
Where are the gaps in knowledge?
The authors give several examples of knowledge gaps in metabarcoding studies (as of publication date). The following are some that caught my eye:
- Batch effects have been shown in 16S bacterial diversity studies but unknown if prevalent in animal and plant studies.
- eDNA studies surveying living aquatic plant communities.
- Estimated sources of eDNA in surface water from lake’s catchment and relating it to diversity that occurs locally.
- Macro-organisms known to inhabit groundwater (gastropods, isopods, fishes, etc).
- Longitudinal transport of animal and plant DNA in marine environs.
- Simulation studies on noisy data sets to see how they conform to neutral theory parameters and affect rank abundance curves to estimate the expected error distribution around estimates.
- Coupling distribution or occupancy modeling with eDNA findings to improve species richness estimates. This technique is still rare in eDNA studies.
So take heart – if you often find yourself up against that inner voice telling you that all the cool ideas have already been taken. In the contemplative and melancholy words of Gillian Welsh, “there’s gotta be a song left to sing/Cause everybody can’t thought of everything”.
All in all, this review is a helpful, comprehensive reference for experimental design considerations in the field, laboratory and data analysis, common loci used for many taxa, summary of eDNA studies in different habitats and what was being measured. It’s worth mentioning that Pompanon et al came out with a paper titled “Who is eating what: diet assessment using next generation sequencing” in 2012 that addresses many of the concepts and challenges found in Denier et al 2017, while also providing more detail and explaining more of the underlying concepts. If you are looking for a good starting point for teaching this field of study, I would definitely include that study and even start there.
Now I shall leave you, dear reader with a couple of links to impressive projects and technologies developed at Monterey Bay Aquarium Research Institute, leaders in development and implementation of in-situ eDNA studies in the deep-sea.
Barnes, M. A., & Turner, C. R. (2016). The ecology of environmental DNA and implications for conservation genetics. Conservation Genetics, 17, 1–17.
Deiner K, Bik HM, Mächler E, et al. Environmental DNA metabarcoding: Transforming how we survey animal and plant communities. Mol Ecol. 2017;26:5872-5895. https://doi.org/10.1111/mec.14350
Drummond AJ, Newcomb RD, Buckley TR, et al. Evaluating a multigene environmental DNA approach for biodiversity assessment. GigaScience. 2015;4:46. doi:10.1186/s13742-015-0086-1.
Goldberg, C. S., Turner, C. R., Deiner, K., Klymus, K. E., Thomsen, P. F., Murphy, M. A., … Cornman, R. S. (2016). Critical considerations for the application of environmental DNA methods to detect aquatic species. Methods in Ecology and Evolution, 7, 1299–1307.
Pompanon, F. , Deagle, B. E., Symondson, W. O., Brown, D. S., Jarman, S. N. and Taberlet, P. (2012), Who is eating what: diet assessment using next generation sequencing. Molecular Ecology, 21: 1931-1950. doi:10.1111/j.1365-294X.2011.05403.x
Thomsen, P. F., Møller, P. R., Sigsgaard, E. E., Knudsen, S. W., Jørgensen, O. A., & Willerslev, E. (2016). Environmental DNA from Seawater Samples Correlate with Trawl Catches of Subarctic, Deepwater Fishes. PLoS ONE, 11, e0165252.
Torti, A., Lever, M. A., & Jørgensen, B. B. (2015). Origin, dynamics, and implications of extracellular DNA pools in marine sediments. Marine Genomics, 24, 185–196.
Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. https://doi.org/10.1371/journal.pcbi.1003285
Schnell, I. B., Sollmann, R., Calvignac‐Spencer, S., Siddall, M. E., Douglas, W. Y., Wilting, A., & Gilbert, M. T. P. (2015). iDNA from terrestrial haematophagous leeches as a wildlife surveying and monitoring tool–prospects, pitfalls and avenues to be developed. Frontiers in Zoology, 12, 1.
Yilmaz, P., Kottmann, R., Field, D., Knight, R., Cole, J. R., Amaral‐Zettler, L., … Cochrane, G. (2011). Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nature Biotechnology, 29, 415–420.