Phylogenetic studies are crucial for ecology and evolution. However, their usefulness for comparative biology or meta-analyses can vary considerably. Especially the inclusion of unidentified species (“Balanus sp.”) obstructs their use in comparative studies. How can I attach life history or morphological data collected in previous studies to an unidentified species? There is no way of telling whether Balanus sp. of one study is the same Balanus sp. of another study. In most cases, I would guess not… You see the problem?
I am currently working on assembling the “Open Tree of Barnacles”, which means I am obtaining published barnacle phylogenies, curate them to match current taxonomy, and then let the awesome people from the Open Tree of Life project do their tree assembly magic. During this work, I felt frustrated at times when dealing with “wasted trees”. Trees that a lot of effort and data went into, but little information comes out; namely such trees with a lot of unidentified species. These unidentified’s appear in the Open Tree of Life, but cannot be linked to species used in different studies, or the taxonomy underlying the Open Tree of Life. Moreover, such unidentified species have no value in comparative studies – one of the big goals of the Open Tree of Life project!
Here are some tips to make the most out of your phylogenetic study, borne out from my curation work on the Open Tree of Life project, and my interest in comparative biology.
Before you begin a phylogenetic study…
- Think about including a taxonomist to help identify species and describe species discovered during the study.
- Familiarize yourself with the current taxonomy, especially current species names.
- Deposit specimens at museum or other official collections. Basically, make them accessible for other researchers.
While you are doing the hard work…
- Use the same IDs throughout the study, e.g. let the same ID’s be part of the specimen ID, the sequence name and the phylogenetic tree tip.
- Make sure sequences can be linked to tree tips as well as specimens. This will not only help you at the time of publication when you need to make tables for all these information, but allows sequences to be linked to trees, and to add more data (be it molecular or physical trait data) to certain individuals.
After the tree is made, but before you forget all about it…
- Deposit sequences and trees in appropriate databases, e.g. sequences in NCBI’s GenBank and trees in Open Tree of Life and/or TreeBase.
- Deposit metadata (when and where specimens were collected, how tree was generated, what kind of data are the trees based on) with the sequence and tree data, and separately in spreadsheets, which can be made available in Dryad or figshare.
- If you identified new species based on molecular data, make an effort to describe them. Either by collaboration with a taxonomist, or by doing it yourself.
I know that describing species is hard work*, but the benefits are huge: A) You get to name a species! B) Your work will get cited because you described a new species. C) Your work will get cited because your phylogeny is now very useful for comparative studies. D) Your new species may give rise to work on their ecology or evolution – in which case your work would get cited. E) If you are anything like me, you feel frustrated by seeing unnamed things – after all, the categorization of things is a main characteristic of the biologist. Imagine the satisfaction of having this “sp.” turn into a name!
*Dear ICZN, I do not mean to encourage the rogue naming of taxa that may not be “real” species. However, I have come to the conclusion that naming species would allow us to start the discussion on whether a species is real or not. Without a name, we do not have common grounds for discussion. Therefore, hail to the naming of species, and to making this process easier!