4th genome of Christmas: the hexaploid bread wheat genome

The first technological innovation to radically change human society was agriculture. The ability to cultivate – rather than hunt or pick – food had a profound change on everything from our immune system to our societal structures. It encouraged specialisation, favoured robust, complex inter-generational knowledge transmission and enabled the explosive growth of this bipedal ape.

Continue reading “4th genome of Christmas: the hexaploid bread wheat genome”

3rd genome of Christmas: the Denisovan little finger

In the early 90s Svante Paabo, a charismatic, energetic innovator, made a bold proposal: that to study human origins one would do well to sequence the DNA of ancient hominids, in particular those species which had gone extinct. After all, DNA could be detected in their bones, provided they were not too old and kept dry and cold.

Continue reading “3rd genome of Christmas: the Denisovan little finger”

C. elegans: an elegant genome

On the second day of christmas, my true love sent to me: The C. elegans (worm) genome. The lowly nematode worm is probably the “newest” widespread model organism, developed by Sydney Brenner and colleagues in the 1960s at the Laboratory for Molecular Biology (LMB) in Cambridge as something between the complexity of fly and the simplicity of yeast.

It was an inspired choice: you could keep the worm in the laboratory easily (it eats a lawn of bacteria, very often E. coli), and setting up crosses was easy and remarkably (and this shows how lucky Sydney is), it has completely stereotypical development. Every adult C. elegans worm has an identical number of cells (John Sulston was one of the key people to work this out who would later lead the worm and genome project). It is as if every cell has a name, with one tree providing the single way of going from a genome to a collection of cells.

Continue reading “C. elegans: an elegant genome”

The First Genome of Christmas: E. coli (and friends)

Inspired by a very boring train stoppage last year, I am going to add, one a day, to this of great / interesting genomes until christmas day.

On the first day of christmas, my true love sent to me:

Escherichia coli and its associated phages. This humble bacterium is one of our commensal organisms; it hangs out in our gut being, usually, useful to us. But the reason why every molecular biologists knows about this critter is that it is also the bedrock of DNA manipulation. Molecular biologists shuttle DNA  from all sorts of different organisms through E. coli constantly.  It is the assembly line for much of molecular biology – where you capture, grow up, extract DNA. The smell of the growth media to grow E. coli infuses all molecular biology labs. E. coli has its own parasites – phages – which are viruses that infect E.coli, and these are as useful as their bacterial host.

Continue reading “The First Genome of Christmas: E. coli (and friends)”

Genomics and Big Data in Medicine

One of the great challenges – and opportunities – over the coming decade is the perfusion of molecular measurement, and accompanying data analysis, into general medicine. This will be nothing new for clinical genetics and other niche disciplines, but as medicine begins to mine the rich data streams from genomics, transcriptomics and metabolomics research, we will start running into some rather tricky integration problems. This is interesting both scientifically and socially, as a huge wave of technology pushes us to create clinical utility out of a confluence of molecular data, high-resolution imaging and data from continuous-sensing devices.

Continue reading “Genomics and Big Data in Medicine”

10,000 Up

I’ve just passed my 10,000th follower on Twitter, and similar to when I went past 5,000 followers this feels like a good point to reflect on this open, ‘blog-and-tweet’ world evolving around me.Many of the comments I made two years ago have stood the test of time: Twitter is still fundamentally a conversation, broadcast not just to your lunch queue but worldwide, and blogs remain lightweight, informal platforms for review and commentary. And as with any conversation you have to consider your audience first, and as with all public writing everyone still need and editor [sic].

Continue reading “10,000 Up”

Anatomy of a mainstream science piece

Last week, the Guardian published a Comment by me entitled, ‘Why I’m sceptical about the idea of geneticallyinherited trauma‘. In this blog post, I’d like to go through what happened behind the scenes when someone from the mainstream press asked for my views, what my thought process was before I started drafting a response, and why I believe we should all participate more in public discourse on science.

Continue reading “Anatomy of a mainstream science piece”

Untangling Big Data

“Big Data” is a trendy, catch-all phrase for handling large datasets in all sorts of domains: finance, advertising, food distribution, physics, astronomy and molecular biology – notably genomics. It means different things to different people, and has inspired any number of conferences, meetings and new companies. Amidst the general hype, some outstanding examples shine forth and today sees an exceptional Big Data analysis paper by a trio of EMBL-EBI research labs – Oliver Stegle, John Marioni and Sarah Teichmann – that shows why all this attention is more than just hype.

Continue reading “Untangling Big Data”

Moving 20 Petabytes

EMBL-EBI’s data resources are built on a constantly running compute and storage infrastructure. Over the past decade that infrastructure has grown exponentially, keeping pace with the rapid growth of molecular data and the corresponding need for computation. Terabytes of data flow every day on and off our storage systems, making up the hidden life-blood of data and knowledge that permeates much of modern molecular biology. There is a somewhat bewildering complexity to all of this. We have 57 key resources: everything from low-level, raw DNA storage (ENA) through genome analysis (Ensembl and Ensembl Genomes), complex knowledge systems (UniProt) and 3D protein structures (PDBe). At minimum, over half a million users visit at least one of the EMBL-EBI websites each month, making 12 million web hits and downloading 35 Terabytes each day. Each resource has its own release cycle, with different international collaborations (e.g. INSDC, wwPDB, ProteomeXchange) handling the worldwide data flow.

Continue reading “Moving 20 Petabytes”