C. elegans: an elegant genome

On the second day of christmas, my true love sent to me: The C. elegans (worm) genome. The lowly nematode worm is probably the “newest” widespread model organism, developed by Sydney Brenner and colleagues in the 1960s at the Laboratory for Molecular Biology (LMB) in Cambridge as something between the complexity of fly and the simplicity of yeast.

It was an inspired choice: you could keep the worm in the laboratory easily (it eats a lawn of bacteria, very often E. coli), and setting up crosses was easy and remarkably (and this shows how lucky Sydney is), it has completely stereotypical development. Every adult C. elegans worm has an identical number of cells (John Sulston was one of the key people to work this out who would later lead the worm and genome project). It is as if every cell has a name, with one tree providing the single way of going from a genome to a collection of cells.

So, as the techniques of genome mapping and sequencing grew up in the 1980s, led by innovations again from the LMB in Fred Sanger’s group, and amazing pronouncements were made that we should sequence the human genome but perhaps test technology out on some simpler organisms, John Sulston choose the worm, and set up what seemed like an insanely ambitious plan to sequence the whole of the ~100 MB genome.

In the early 80s an American MD PhD, Bob Waterston, came around to map more worm genes and got hooked into first the mapping and then the sequencing with John Sulston and Alan Coulson. This started a long-time collaboration between Bob and John, ending in the human genome, and when Bob went back to  St louis, there was a constant flow of people between the two groups. Later on, when John had set up the Sanger Centre (it changed to the Sanger Institute) and Bob the Genome Science Centre, whole teams would exchange between these two sites.

And it was madness – in the late 80s – to state that you were going to sequence all of the worm genome. Remember, no bacteria had been sequenced at that time, and it seemed like there should be perhaps a bigger focus on technology development. But John and Bob both realised that it was feasible (with effort: a veritable “factory” of people) and that many innovations were small, not large. Many of the technical aspects came from this drive. For example, the use of shotgun (random sampling of DNA pieces) on individual fragments of DNA, followed by the process of “finishing” where specific experiments were designed came to be standard (giving rise to a new laboratory role: ‘finishers’). Roger Staden’s “Staden package” was the first computational workbench for this.

And the worm was sequenced in 1998, a mere year after E. coli. Due to another quirk of biology – that C. elegans do not have complex, repeat dense centromeres – it still remains the only metazoan genome with a complete sequence end-to-end. Both its genome and its development somehow capturing the species name “elegans” well.

I had arrived as a young PhD student just before the final publication, and started to work a little, on the side, on worm genome annotation. Because of the organisation around the worm genome the annotation of the genome had kept pace with the sequencing. The annotation happened in this rather amazing software system, called ACeDB, which one would now call a No-SQL graph database, written by Richard Durbin (my then supervisor) and Jean Theirry-Mieg, a French physicist (you knew when you hit a bug in Jean’s code, as it would crash with a series of french expletives on the console).

ACeDB was completely before its time – a graph/document database; the concept of using hyperlinks between display items (before the web) and integrated, customised graphical displays on the database. The annotation of the worm happened via a blend of automated programs/scripts and dedicated manual effort, confirming, and sometimes tweaking, each exon and splice site by hand. My beast of a gene prediction program (GeneWise) started to pump out information at hideous computational cost to go into this. The worm therefore led also in the concepts around annotation and information storage, being the first “modern” model organism database.

The worm community – both in genomics and more broadly – remains extremely tight. You can just about fit the entire community in one conference, and there is a sense of camaraderie that runs deep. Worms continue to lead a lot of developmental biology, in particular in neuronal circuits where they remain the only organism with a complete map of their small, primitive (but perfectly formed) brain. The genome is just one part of the completeness of this organism, but also has a huge part in the history of genomics.