There are essentials, and there are essentials. Here are several bioinformatic tools that I use on a daily basis (with a Python bias):
-
Python – rapidly becoming the go-to, high-level language of choice in biology. If you aren’t happy with Python, try Perl or Ruby. If you aren’t happy with high-level, there’s always C. If you are not happy with C (or C++ or Fortran), there are any number of functional programming languages. You should view each language as a particular tool for a particular job, and not all are well suited to certain tasks. Python is a very general language and is available in many formats suitable for a number of architectures here.
-
Kent Source – probably the most useful and legendary package of bioinformatics code for large-scale, genomic data manipulation currently available. Written in C and very fast. Requires compilation and available as zip or via git.
-
PyFasta – an optimized library for rapidly accessing massive fasta files. Also available at bitbucket.
-
biopython – one of the oldest bioinformatics libraries for Python, it is a large library with a great deal of functionality. Available in a number of formats at the biopython downloads page.
-
oursql – for the most part, you are going to find that you need access to a database. Typically, that will be a mysql database (although I also like postgres). I prefer oursql because of its buffering. Available from launchpad.
Runners Up
I find these interesting, but I have yet to put them into the daily rotation:
-
Pygr – bills itself as a “scalable bioinformatics interface”. Provides a number of wonderful ways to access the data and database of your choice.
-
sqlalchemy – a great SQL toolkit and object relational mapper for python. Works with a vast array of databases and database APIs.
-
disco – disco is python framework implementing the map-reduce algorithm (manuscript) for Python programmers using an erlang engine. It is highly fault-tolerant and well-suited to massive amounts of processing where the problem to be solved is parallelizable using the map-reduce approach. Also provides some handy tools like Discodex and the discodb object.