Sharing clinical data: everyone wins

Patients who contribute their data to research are primarily motivated by a desire to help others with the same plight, through the development of better treatments or even a cure. Out of respect for these individuals, and to uphold the fundamental tenets of the scientific process, I’d like the clinical trials community to shift its default position on data sharing and reuse to align to data availability on publication, similar to the life science community. This will enable more robust, rigorous research, create new opportunities for discovery and build trust between patients and scientists.

Continue reading “Sharing clinical data: everyone wins”

Advice on Big Data Experiments and Analysis, Part I: Planning

Biology has changed a lot over the past decade, driven by ever-cheaper data gathering technologies: genomics, transcriptomics, proteomics, metabolomics and imaging of all sorts. After a few years of gleeful abandon in the data generation department, analysis has come to the fore, demanding a whole new outlook and on-going collaboration between scientists, statisticians, engineers and others who bring to the table a very broad range of skills and experience.

Continue reading “Advice on Big Data Experiments and Analysis, Part I: Planning”

In defence of model organisms

I have written about the rise of human as a first-class model organism, and am an enthusiastic user of this outbred, large vertebrate, which can walk right into pre-funded phenotyping centres (hospitals). However, some scientists are (somewhat flippantly) predicting ‘the demise of all non-human model organisms’ completely, only conceding the necessity for using mouse in impossible-in-human verification experiments. Although such positions tend to be put forward in jest, their underlying argument resonates: given our obsession on human health, and how much we can do humans – with broad outbred genetics, iPSC cell lines and organoids – why should we bother with other systems?

Continue reading “In defence of model organisms”

12th genome of Christmas: The platypus

In 1799 George Shaw, the head of the Natural History Museum in London, received a bizarre pelt from a Captain in Australia: a duck bill attached to what felt like mole skin. Shaw examined the specimen and wrote up a description of it in a scientific journal,  but he couldn’t help confessing that it was “impossible not to entertain some doubts as to the genuine nature of the animal, and to surmise that there might have been practised some arts of deception in its structure.” Hoaxes were rife at the time, with Chinese traders stitching together parts of different animals – part bird, part mammal – to make artful concoctions that would trick European visitors. Georgian London was becoming rather skeptical of these increasingly fantastical pieces of taxidermy.

Continue reading “12th genome of Christmas: The platypus”

11th genome of Christmas: Us

Ever since the discovery of DNA as the molecule responsible for genetics, in particular when it became clear that the ordering of the chemical components in this polymer was the information that DNA stored, scientists have dreamt about determining the full sequence of the human genome. For Francis Crick, who co-discovered the structure of DNA (along with James Watson, using data from Rosalind Franklin) this would be the final step towards unifying life and chemistry: demystifying the remarkable process that leads to us and all other living creatures. Back in 1953 this was a fantasy, but slowly and steadily over the ensuing decades it became a reality.

The first step was developing a routine way to determine the order of the chemicals in the DNA polymer: sequencing. Fred Sanger, a gifted scientist and the only person with two Noble prizes in the same field under his belt, developed dideoxy-sequencing (a.k.a. “Sanger sequencing”) at the LMB in the 1970s. His laboratory, along with neighbourghing LMB labs including Sydney Brenner’s, produced a new generation of scientists: John Sulston, Bart Barrell, Roger Staden and Alan Coulson, who forged ahead towards the seemingly unobtainable goal of sequencing whole organisms – with human in their sights. First, they did the different bacteriophages (see my First Genome of Christmas). Then, in the 1980s John Sulston and colleagues started on mapping then sequencing the worm (see the Second Genome of Christmas).

Of course this was not just a UK effort; many US scientists were involved in genomics. A scientist and technology developer , Lee Hood, looked at how to remove the radioactivity that came with Sanger sequencing, and created flourophore based terminators. These were far safer and, importantly, amenable to automation. This led to the ABI company’s production of automated sequencers, which featured a scanning laser-based readout. Back in the UK, Alec Jeffreys made a serendipitous discovery: microsatellites – highly variable regions in the human genome that provided easy-to-determine genetic markers. This led to the rise of forensic DNA typing (first done for a criminal case near Alec’s native Leicester to provide evidence in a double murder case). A group of enterprising geneticists in France, led by Jean Weissenbach, used these microsatellites to generate the first genome-wide genetic map, based around Mormon families in Utah, who had kept impeccable family records. Clinician scientists were starting to use genetics actively: the first genetic diseases to be characterised molecularly were a set of haemglobinopathies (blood disorders such as sickle cell anaemia). In these cases, the clinicans were lucky that it was easy to track the protein itself as a genetic marker. A landmark breakthrough, by Francis Collins and colleagues, was the cloning of the gene for cystic fibrosis, using only DNA-based “positional” techniques, without knowing the actual defective protein. This was, at last, a clear, practical application of genomics.

From 1985 through the first part of the 1990s, all of these technologies and uses of DNA were improving, and it became increasingly clear that it was at least possible to consider sequencing the entire genome. However, this was still more of a sheer cliff than a gentle slope to climb. The human genome has three billion letters, a million-fold larger than bacteriophages and 30 times larger than the worm. If the human genome was going to be tackled, it was going to take a substantial, coordinated effort. Debates raged about the best technologies and approaches, the right time to invest in production vs developing better technology, and who, worldwide, would do what.

By the mid 90s things had settled down. The step-by-step approach used in the worm was clearly going to succeed, and there was no reason not to see the same approach working in human. The approach of mapping first, then sequencing was also compatible with international coordination, whereby each chromosome could be worked on separately without people treading on each other’s toes. There was some jostling about which groups should do which chromosomes (the small ones were claimed first, unsurprisingly), and some grumbling about people reaching beyond their actual capacity, but it was all on track to deliver around 2010.

Five large centres offered the biggest capacity: 
  • The Sanger Centre (now the Sanger Institute), led by John Sulston with Jane Rogers and David Bentley as key scientists, funded by the Wellcome Trust, a UK charity; 
  • US Department of Energy (DOE)-funded groups around the Bay Area in California (now the Joint Genome Institute, JGI), with Rick Myers in the early stages and Eddy Rubin pulling the configuration together;
  • Three US National Institutes of Health (NIH) centres, with oversight from Francis Collins, director of the NIH’s National Human Genome Research Institute: 
  • The Washington University genome center in St Louis, led by Bob Waterston with Richard Wilson and Elaine Mardis as key scientists (this was the Sanger’s sister group on the worm as well); 
  • Mathematician-turned-geneticist (and part time entrepreneur), Eric Lander, who formed the Whitehead Genome centre as part of MIT (now the Broad Institute); 
  • An Australian transplanted into Texas, Richard Gibbs, at the Baylor genome centre. 
Two other groups claimed a chromosome in its entirety: Genoscope in France, led by Jean Weissenbach, had its sights on Chromosome 14, and a Japanese-led consortium took on Chromosome 21. 
Very often, the genome would be depicted with tiny little flags superimposed, as if it had territories to claim. But happily there was an early landmark agreement, the Bermuda Principles, that stipulated all data would be put into the public domain within 24 hours.

For a few years, the Human Genome Project followed a steady rhythm: large-scale physical mapping followed by sequencing. Chromosome 22 was the first to be sequenced, by the Dunham team at the Sanger Centre. I remember poring over the sequence and gene models of this tiny human chromosome and thinking just how big the task ahead of us was. Chromosome 21 was heading to completion, and many other larger chromosomes were slowly being wrangled into shape.

Then, the sequencing world was turned upside down.

Craig Venter, a scientist/businessman had been around the academic genomic world for sometime, and realised perhaps better than anyone else the potential impact of automation. He had already published the first whole-genome shotgun bacteria and, inspired by a paper from Gene Myers (a computer scientist working on text analysis, and converting to biology) realised that a similar approach could work on human. Craig assembled an excellent set of scientists – Gene Myers, Granger Sutton and Mark Adams among others – and persuaded leading technology company ABI to set up a new venture to sequence the human genome – privately. This was at the end of the 1990s, at the start of the dotcom boom when it was anyone’s guess what a viable business model would be. Certainly, holding a key piece of information for biomedical research 10 years before the public domain effort looked a pretty good bet. Celera was born, raised a substantial amount of money on the US stock market and purchased a massive fleet of sequencers and computers. 

Naturally, this was quite a shock to the academic project. I remember John Sulston gathering all of the Sanger Centre employees in the auditorium (I was a PhD student at the time) and telling us that this was a good thing – but complex. Behind the scenes there were all manner of discussions, best read about in one of the numerous books that came out. By my own recollection, there was a sneaking respect for Craig’s sheer chutzpa, coupled with a massive sense that one simply couldn’t have one organisation – and certainly not a company – own this key information. 
I later discovered that the Wellcome Trust, the large UK charity behind the Sanger Centre, took the important step of backing John Sulston to sequence the entire genome if necessary, to ensure it would be put it into the public domain (the US academic components were being asked whether their effort was value for money for the taxpayers). The ability for this charity to “buy in” the genome sequence to the public domain was critical to keeping the genome open (in fact, the US academic projects continued, but it is unclear what would have happened had this stance been taken). More publicly, there were some quite unseemly spats, for example on the feasibility of the whole-genome shotgun approach.

The academic project also responded to the new, higher-pressure timeline. Rather than keeping with the map-first, sequence second approach, people switched to sequence-and-map as one scheme, but still with mid-size pieces (BACs – around 100,000 letter regions) rather than reads (only 500 letters at a time). This was a half-way point towards whole-genome shotgun and, critically, allowed the five major centres to accelerate their production rate. The nice map with flags across the genome basically disappeared (though each chromosome would then be mapped and finished) and the five centres ploughed onwards, leaving footprints all over the nice, tidy, well-laid plan.

But this acceleration of rate caused another problem: bottlenecks in the downstream informatics. Celera started to crow a bit about their depth of human talent in computer science and the size of their computer farm. This became a real issue. The public project was facing a very real headache of having thousands of fragments of the genome without any real way to put them together. My supervisor, Richard Durbin, was the lead computational person at Sanger and stepped up along with other academic groups, notably the creative, enthusiastic computer scientist David Haussler in Santa Cruz. David and Richard had worked on and off on all sorts of things, bringing in parts of computer science methods into biology, and they – with us, their groups – began to try and crack this problem.

The first problem was assembly. Previously, we were guided by a “physical map” and assembly was effectively done by hand on a computer-based workbench. This needed to change. David was joined by ex-computer-gaming programmer Jim Kent, who felt he could do this. I remember discussing the details of assembly methods and concepts on a phone call, with Jim enthusiastically claiming it was doable and everyone agreeing that Jim should come to the Sanger Centre for a while to absorb the details of overlaps, dispersed repeats and other Sanger genome lore. He packed his bags and left that day, appearing 12 hours later in Hinxton: a jovial, very definitely west-coast Amercian, ready to get to work. Jim worked constantly for about six months (back in Santa Cruz) solid to create the “golden path assembler”, which provided the sequence for the public projects. Jim also created the UCSC Browser, which remains one of the premier ways to access the human genome (though of course I am partial to a different, leading browser…).

And it didn’t stop there. The public project and the private Celera project were now really swapping insults in public, and Celera said that even if the public project could assemble their genome, they wouldn’t be able to find the genes in this sequence. Thankfully, three of us – Michele Clamp, Tim Hubbard and myself – had already started a sort of ‘skunk-works’ project at Sanger to be able to automatically annotate the genome. The algorithmic core was a program I had written, GeneWise, which was accurate and error-tolerant but insanely computationally expensive. Tim had a (in-retrospect, bonkers) cascading file system to try to match the raw computation with the arrival of data in real time. Michele was the key integrator. She was able to take Tim’s raw computes, craft the right approximation (described as “Mini-seq”) and pass it into GeneWise. This started to work, and we made a website around it: the Ensembl project, which provided another way to look at the genome. (Mini-seqs and GeneWise still hum away in the middle of Ensembl gene builds, and are responsible for the majority of vertebrate and many other gene sets.)

Even more surreally for me, the corresponding Celera annotation project was also using GeneWise (I had released it open source, as I would do everything), so I would have a list of bugs and issues from Michele and Ensembl during the day, and then a list of bugs and issues from Mark Yandell and colleagues from Celera overnight. The friendliness and openness of the Celera scientists – Gene, Mark Adams and Mark Yandell – was at complete odds to the increasingly bitter public stance between the two groups.

It was an intense but fun time. Michele and I worked around the clock to provide a sensible model of the genome and features (using – radically at the time – an SQL backend), and there were constant improvements to how we computed, stored and displayed information. We’d often work all day, flat out, and then head back to Cambridge, often in Michele’s house where we’d snatch a quick bite and watch the latest set of compute jobs fan out across the new, shiny compute farm bought to beef up Ensembl’s computational muscle. Michele’s partner (now husband) James ran the high-end computers, so if anything went wrong, from system through algorithm to integration – one of us was on hand to fix it. As the first jobs came back successfully, we would slowly relax, and eventually reward ourselves with a gin and tonic as we continued to keep one eye on the compute farm.

Eventually it became clear that both projects were going to get there – pretty much – in a dead heat. Given that the public project’s data could be integrated into the private version, Celera switched data production efforts to mouse, much to Gene Myers’ annoyance as he wanted to show that he could make a clean, good assembly from a pure whole-genome shotgun. There was a brokering of a joint statement between Celera and the public project, and this led to a live announcement from the White House by Bill Clinton, flanked by Craig Venter (private) and Francis Collins (public), with a TV link to Tony Blair and John Sulston in the UK.

One figure in this announcement came from our work: the number of human genes in the genome. This is a fun story in itself – I can’t do justice to it now – involving wild over-estimation for over two decades followed by extensive soul-searching as the first human chromosomes came out. I ended up running a sweepstake for the number whereby, in effect, we showed that in the absence of good data, even 200 scientists can be completely wrong. For the press release, it was our job to come up with an estimate of the number of human genes, so Michele launched our best-recipe-at-the-time compute. Bugs were found and squashed, and I remember hanging around, providing coffee and chocolate to Michele as needed (there is no point really in trying to debug someone else’s code in a pressurised environment). Eventually an estimate popped out: around 26,000 protein-coding genes.

We looked at each other and shook our heads – clearly too low, we thought, and went into the global phone conference where the good and the great of genomics said “too low” as well. So we went back and calculated all sorts of other ways there could be more protein coding genes (after all, a biotech called Incyte had been selling access to 100,000 human genes for over five years). We ended up with the rather clumsy phrase, “We have strong evidence for around 25,000 protein-coding genes, and there may be up to 35,000.”

In retrospect, Michele and I would have been better sticking to our guns, and going with the data. In fact, we now know there are around 20,000 protein-coding genes (though there are enough complex edge cases not to have a final number, even today).

The human genome was done in a rush, with enthusiasm, twice, in both cases in such a complex way that no other genome would be done like this again. In fact, Gene Myers was right. Whole-genome shotgun was “pretty good” (though purists would always point out that if you wanted the whole thing, it wouldn’t be adequate). The public project, John Sulston above all, was right that this information was for all of humanity, and should not be controlled by any one organisation. 

With all the excitement and personality of the “race” for the human genome, it is easy to forget what the lasting impact was. As with all of genomics, it is not the papers, nor the flourishes of biology or speculation about the future that makes the impact, but two features of this data: the genome is finite, and all biology, however complex, can be indexed to it.  This is doesn’t mean that knowing the genome somehow provides you with all the biology – quite the opposite is true. It is often the starting point for efforts to unravel biology. But there this was a major phase change in molecular biology, between not knowing the genome sequence and knowing it.

I was very lucky to be at the right place at the right time to be a part of this game-changing time for human biology. Crazy days.

10th genome of Christmas: The laboratory mouse

After human, the most studied animal, by a long margin, is mouse. Or, more strictly, the laboratory mouse, which is a rather curious creation of the last 200 years of breeding and science. 

Laboratory mice originate mainly from circus mice and pet “fancy” mice kept by wealthy American and European ladies in the 18th century. Many of these mice had their roots in Japan and China, where their ancestors would have been kept by rich households. Unsurprisingly, the selection of which mice to breed over the centuries came down to habituation to humans and coat colour rather than scientific principles. 
The founding genetic material for the lab mouse was not just one species, the European house mouse (Mus musculus domesticus), but three: Mus musculus domesticus, Mus musculus musculus (mainly Asian) and Mus musculus castaneus. Because mice have been following humans around for thousands of years, the history of these three species or strains (everything gets a bit murky here, as mice mate if they meet – but Asia to Europe is quite a distance if you are a mouse) is complex, to say the least.

Mice got their start in the genetics laboratory in a rather eccentric collaboration between a Harvard Geneticist (W. E. Castle) and a fancy-mouse breeder (Abbie Lathrop), who provided a series of mice with specific traits, such as Japanese Waltzing mice. Abbie arguably ran the world’s first-ever mouse house on her farm in Massachusetts. A student of Castle, C.C. Little, got involved in studying mice and transformed a small hamlet on the coast of Maine, Bar Harbor, into a research laboratory, later named the “Jackson Laboratory” after a generous donor. The Jackson lab (shortened to “Jax”) is still one of the world’s premier mouse research sites.

Mice are excellent mammalian models: they really do have all the cell types, tissues and organs that human has, and so many features (though not all) of human biology, from cellular to physiological, can be replicated and studied in this animal. But it is the detailed control we have over the mouse genome that makes it an exceptional species for helping us understand biology. This control is thanks to two key developments. First, because mouse embryonic stem cells can be produced so easily, there are mouse cells (which you can keep in a petri dish) that can be coaxed into making viable embryos. These embryos can be implanted in pseudopregnant mice, and become full grown individuals. Second, one can swap pieces of DNA in and out in these stem cell lines at will – almost as easily as in yeast (and certainly more easily than in fly or worm). 

The ability to swap, not just insert, DNA segments (“homologous recombination”) is key. This unique-in-animals genomic control of genetics means there are elegant, precise experiments that are only feasible in mouse. For example, one can ‘humanise’ specific genes (i.e. swap the human copy in for the mouse copy), or trigger the deletion of a gene at a particular developmental time-point by using a variety control elements, ending up with molecular ‘cutters’ that will turn on only when you want them to. Mice are far more than just a ‘good’ model for human – they are arguably the premier multi-cellular organism over which we have the most experimental control. 

Given its importance to a massive community of researchers, mouse was clearly going to be the most important genome to sequence, after human.
The Black6 strain (Full name: C57BL/6) from the original breeding of C.C. Little was chosen as the strain to sequence, because it was the most inbred and the one most often used in experiments. Indeed, in the public/private race to the human genome (more on this in a later post), the company Celera switched to sequencing mouse when it was clear that the public human genome project was matching the Celera production rate. 
Both the Celera mouse data and the public mouse genome data were based on a whole-genome shotgun sequencing approach. This was standard fare for Celera, but signalled the start of whole-genome shotgun sequencing for ‘big’ genomes academically (at least for ‘reasonable’ draft genomes). The inbred nature of mice, Black 6 in particular, simplifies the assembly problem for whole genome shotgun. It’s bad enough trying to put together a 3 billion-letter-long genome from 500 letter fragments – it’s even worse when you have two near-but-not-quite-identical 3 billion-letter-long genomes to reconstruct. 
But in many ways, the mouse genome brought us into a new era of genome sequencing: one of routine, ‘pretty good’ drafts from whole-genome shotgun, with fairly routine automated annotation. This was in stark contrast to the step-by-step approach taken with previous genomes, coupled with a more involved, manual annotation. 
Given the importance of mouse to researchers, both the genome and the annotation have been regularly upgraded. Though they had broken the back of the big-genome quandary, like many problems, the last 10% of the work, sorting things out, has turned out to be as annoying and involved as the first 90% of the job. After the first draft mouse genome, the next five years was about nailing down the frustrating ~10% of the genome that wasn’t easy to assemble from shotgun, and attending to all the details.

Mouse is also likely to lead us in future to a more graph-based view of reference genomes. As there are inbred lines of mice, one can really talk about “individual” genomes in a solid way, knowing that others can ‘order up’ the same strain and work on them. Thomas Keane and colleagues have been building out the set of mouse strains beyond Black6, and doing increasingly independent assemblies, strain by strain. The resulting set of individual sequences absolutely shows the complex origin of laboratory mice; at any point, some mouse strains are as divergent as two species, and some are more like two individuals from a population. This complex web is best represented as a graph of sequences, rather than a set of edits from one reference, which is the current mode. 

In 1787 Chobei Zenya (from Kyoto) wrote a book, “The Breeding of Curious Varieties of the Mouse”, which apparently had “recipes” for making particular coat colours for breeding strategies. There are far earlier documents from China on mouse strains, including the “waltzing” mouse (which we now know is a neurological condition). In some sense this is both the rootstock of this laboratory species and part of the motivation for and discovery of evolution and genetics (though Darwin spent more time looking at pigeons than mice). 

Given the laboratory mouse’s flexible genetic manipulation, we will studying this species for at another 200 years.

9th genome of Christmas: Medaka and friends

My ninth genome of Christmas is a bit of an indulgence: the gentlemanly, diminutive Medaka fish, or Japanese rice paddy fish.

When Mendel’s laws were rediscovered in the 1900s, many scientists turned to local species they could keep easily to explore this brave, new world of genetics. In America, Thomas Hunt chose the fruit fly. Scientists in Germany explored the guppy and Ginuea pigs. In England, crop plants were the focus of early genetics. In Japan, researchers turned to the tiny Medaka fish, a common addition to many of the ornamental ponds maintained in Japanese gardens. 

Medaka fish are regular tenants of rice paddies and streams all through east Asia, from Shanghai through the Korean peninsula and the islands of Japan, with the exception of the very northern set of islands in Japanese archipelago. (Naturally, every country has a different name for this fish, but it is most widely used for study in Japan so I am using the Japanese terms.) Fishing for Medaka is as common for Japanese children as fishing for guppies or fry is for European children, and is widely depicted in 19th century Japanese wood blocks.

Medaka also has the honour of being the first organism to show us that cross-over on the sex chromosomes does occur. We now know this to be commonplace, but at the time of its discovery this was a novel observation.

As genetics developed, Japanese researchers continued to inbreed Medaka fish, creating one of the most diverse set of inbred individual invertebrates from a single species in the world. Being fish, they have all the cell types and nearly all the organs that a mammal has: tiny, two-chambered hearts, livers, kidneys, muscles, brains, bones and eyes. Conveniently, one can keep lots and lots of them, far more cheaply than mice, and they reproduce regularly, with a generation time of around three months.

But then a different fish rose to prominence in molecular biology in the 1980s. Zebrafish, native of the Ganges, was chosen by the influential Christiane Nusslein-Volhard as the basis for redoing her Nobel-Prize-winning forward genetic screens in Drosophila, this time in a vertebrate. 

I’ve not yet asked Christiane whether she ever thought about using Medaka rather than Zebrafish, but I am sure that a couple of details to husbandry made Zebrafish very attractive: it lays 1000 eggs at a time, providing for excellent single-female progeny, and is transparent during its embryonic stage, allowing for easy light microscopy of the developing fish. In contrast, Medaka lay only around 30 eggs, and they stick to the female rather than being spurted out, so harvesting them is somewhat complex. Plus, the eggs have an opaque glycoprotein layer, which skilled scientists can remove but again makes it harder to study the embryo

So why am I so interested in Medaka? Well, I was having a beer with my colleague Jochen Wittbrodt, who is one of the rare Medaka specialists outside of Japan, and we were discussing the next stage of experiments. Medaka fish has a neat trick by which one can introduce foreign DNA (e.g. human) coupled to a reporter (green fluorescent protein from jellyfish is a favourite – easy to pick up using a microscope). Even on the first injection, the foreign DNA will often go into every cell. For most other species, you have to get lucky for the foreign DNA to go the germline, and then hope it will breed true. Jochen had done a number of successful reporter experiments based on designs from my group, and we were discussing whether we could draw on the long history of Medaka research with its rich tapestry of inbred lines to explore the impact of natural variation on these reporter experiments. So, I asked him how many inbred Medaka lines there were, and Jochen nonchalantly replied that he had no idea – after all, his colleague, Kiyoshi Naruse, made one or two new lines from the wild every year or so.

My jaw hit the floor. From the wild? I checked. Jochen confirmed. And then I explored some more, and discovered that there was a whole protocol for creating inbred individual Medaka from the wild.

This might sound trivial, but it is not. Keeping vertebrates in a laboratory is hard. Keeping them in a laboratory when they are inbred, such that their diploid genome is identical everywhere, is extremely difficult. Doing this routinely from the wild is basically unheard of (although this “self’ing” happens all the time in plant genetics). 

Standard theory holds that every individual, whatever the species, has a number of recessive lethal alleles, which will kill the animal if you make them the same. The trick to making an inbred line that is truly the same everywhere (i.e. homozygous) is regular brother-sister mating and an awful lot of patience, as at some point you have to find the combination of alleles in an individual that does not have a lethal effect. Normal animal husbandry lore would have it that this was such hard work, particular with wild individuals, that it would be best to just continue propagating the hard work carried out by the original founders of whichever organism you are using.
Now, this theory does not hold true for plants, and plant geneticists have enjoyed making inbred lines from the earliest days. And Trudy Mackay, looking at the tricks you can play, created a set of inbreds from wild Drosophila lines. One can study developmental changes by looking at different individuals from the same genetic line, but it has to be at different times. One can study the interaction of genes and environment by raising genetically identical individuals in different environments, but it must be done across a panel of strains that represent a wild population. The model plant Arabidopsis has been used by geneticists to do this for decades; fly geneticists are just starting to. 
This kind of work would have been considered madness in vertebrates. You can’t even keep one or two laboratory zebrafish lines fully inbred – you often need to add back a bit of diversity. There are established inbred laboratory mice, but from a weird multi-species hybrid. Single, wild-derived mice strains have been established, but not at scale – not least because of the complications inherent to keeping mouse facilities pathogen-free, which makes everyone a bit paranoid about wild mice in a laboratory setting. 

But in Medaka, it could be doable. Impressive.

Jochen introduced me to Felix Loosli, the best Medaka breeder outside of Japan, and Kiyoshi Naruse, one of the leading breeders in Japan. The four of us have undertaken to generate and characterise a Medaka inbred panel from a single wild population (unsurprisingly, very close to Kiyoshi’s lab, in Nagoya). 

The Medaka genome has of course been sequenced, in a relatively standard, somewhat quirky way by a Japanese group. This genome is a pretty standard fish genome, around the a third the size of human. Medaka are close to some other evolutionarily interesting fish: the stickleback, beloved of ecologists thanks to the numerous species that form in different river and lake systems; cichlids, with a similarly diverse set of species living around the African lakes and Fugu (and loved by sushi gourmands because of the powerful neurotoxin which, so long as it is only in trace amounts, produces an intriguing taste), and loved by genomicists as the vertebrate with the smallest genome. 
Together, these four funky fish will, I hope, push forward research into vertebrate genetics with evolution, ecology, and environment. Our own contribution is in creating the first ever inbred-from-the-wild panel in vertebrates.

Watch this space.