Reflections on reproducibility, digital communication and open science
http://mimacleaning.com/writing-help-for-kindergarten Is science sound? There has been a sustained discussion about this over the past five years – ever-present in the background, and punctuated by intense public debates, both in the scientific press and more broadly. There is a host of concerns – from reproducibility of science to incentive structures – all focused ultimately on how we know what is true and what is not. The answer is not always straightforward.
Science’s health record
A whole new field of ‘meta-research’ has emerged, in which people are carefully probing the health of scientific practice in its various forms. Some of the diagnoses in biology’s health record include:
- analyses of the straight-up reproducibility of experiments
- new requirements imposed by journals
- proposals for ‘reproducibility incentive structures’
- exposing the bias inherent in the scientific reward system itself.
These are all very important discussions to have, whether they are new problems or just flare-ups of old ones. But science itself is starting to look like a bit of a hypochondriac.
It takes a village
However, a deep understanding of the solidity of facts in a field – or lack thereof – is usually shared by the vast majority of the practicing scientists immersed in any sub-discipline. Just like in a small village, digressions are very hard to hide.
There is far less angst than one might expect over what is true and what is not among people who are busy unpicking a particular set of living processes – usually together. And, like in a village, social processes can allow other aspects of bias creep in.
I’d like share my own perspective on how reproducibility is woven in to the fabric of practicing science – particularly biomedical and life sciences.
Science v. papers
Experiments and analysis: the protons and electrons of science
The fundamental units of science are original experiments and analysis, which are described in papers – sometimes singly, sometimes collated. One can reproduce or refute experiments and analyses, not papers.
Both experiments and analyses can be done well or poorly, and results are considered ‘solid’ only when both pass muster.
Every scientist I know makes a clear distinction between the core experiments and analyses (what appears in the Results section of a paper) and the more speculative musings about what these results imply (the domain of the Discussion section). Often, these are interwoven in papers, which is interesting but exasperating.
The quality of a paper is judged superficially by the number of times it’s been cited by others, or by the reputation of the journal in which it appears. Looking more closely, a paper might present a mix of sound and unsound data and analyses.
It is the reproducibility and robustness of these structural units that can be called into question – not the paper itself.
Replication: explicit or implicit?
Explicit replication is rare, but implicit replication in control experiments is common.
The apparent lack of replication across fields is the source of much frustration. So, too, is the seemingly cavalier attitude scientists have towards the preference for exciting experiments over careful, probing tests of other people’s findings.
But there is a reason for this lack of enthusiasm for extensive discussions of previous experiments – namely the broad, extensive set of control experiments – sometimes published, quite often not – that ensure the validity of a particular experimental path for the researchers exploring an area.
Control experiments range from tests that ensure reagent batches are right through confirmatory analyses from large-scale datasets that test the soundness of a particular dataset (and, accordingly, the soundness of a previous experiment and analysis).
Frustratingly, this is such common practice that it is not well tracked. Indeed, many of these control experiments and analyses are not published, and when they are, they are buried in supplements. Their appearance in Results might support the reasoning behind new experiments or the soundness of a dataset; they would rarely be presented as a confirmation of the previous results.
A strong weave of experiments and analysis
American physicist Richard Feynman said it best: “you must not fool yourself – and you are the easiest person to fool”.
“We’ve learned from experience that the truth will come out. Other experimenters will repeat your experiment and find out whether you were wrong or right. Nature’s phenomena will agree or they’ll disagree with your theory. And, although you may gain some temporary fame and excitement, you will not gain a good reputation as a scientist if you haven’t tried to be very careful in this kind of work.” – Richard Feynman
To earn respect, a scientist must be his or her own biggest sceptic, and maintain a cynical mind set about his or her own work. You don’t have to take my word for it – most scientists design interlocking experiments and analyses, nowadays often drawing on other people’s experiments to provide more compelling evidence supporting a particular result.
Direct replication or meta-analysis are not required in all fields, though in some they are critical – sometimes to the point that you can’t publish without your own internal replication. More commonly, one aims to include orthogonal experiments to support a result from more than one angle. This ‘weave’ of results gets stronger as more (and more varied) scientific groups add their experiments, which is one of the reasons collaboration is so desirable in science.
Implicitly, scientists know that all laboratories (groups of people) come with their own biases, strengths and weaknesses. When different laboratories confirm the same results, (usually in somewhat different ways) it adds a surveillance layer, giving confidence that a set of experiments and subsequent analyses are handled correctly. The more groups – and the larger the diversity of groups, the stronger the weave and the less likely a collective, systematic failure will happen.
Weak or wrong results are rarely refuted in the literature
In terms of the literature, wrong results and weak experiments are like ghosts in the house of science.
Specialist communities know almost immediately where and why an experiment or analysis went wrong, which sometimes prompts a (somewhat old fashioned) letter to the editor, which is declined more often than not. But few people go through the hassle of actually writing down their contrary opinion at all, never mind submitting it for publication. This lack of transparent correction in the literature is extremely frustrating for anyone outside that speciality.
When things are working well in a scientific community, it is apparent in the literature – mainly because solid results are highly visible. One can see how useful an experiment was even two or three years after an experiment, even if the subject is not familiar – as long as the experiment is helped support the final view. If you’re lucky, a well written review, citing a broad range of scientists whose work supports a consensus view, will give you a clear idea of the state of the art. That review is not going to feature incorrect experiments and analyses, unless it’s making a point about quality.
Weak or wrong experiments effectively fade away.
Science is social
Science is a very social, truth-seeking process.
If the literature was your only window on science, you might not see how incredibly social it is. We meet regularly, are dreadful gossips and cultivate our reputations and positions carefully as risk takers, arbiters or consolidators, falling easily into archetype roles. One scientist runs every control possible, mulling over every possible ‘gotcha’ in their analysis. Another revels in posing challenges and skewering current accepted opinion – perhaps at the expense of rigour. Both are needed.
Coming back to the village analogy, one’s reputation in the scientific community is unavoidably linked to personal aspects like ego (or lack thereof), presentation skills (or awkwardness), and relative position on the collegiate/selfish behaviour scale.
This social process is a basic part of keeping science transparent. It requires that people meet regularly – not just e-mail drafts of papers or mention one another on social media. Almost invariably, a field is defined by one or two key annual (or biennial) meetings where people expose their work to the praise and ridicule of their peers.
What happens at scientific meetings
Scientific meetings are about two things: presenting discoveries (experiments and analyses) and sparking new collaborations, often in ways that involve food and drink. They provide a social framework for recognition, tough questioning (usually more brutal than peer review), horse trading and gossip.
Much of the picking apart of current, active science happens in these free-trading, no-holds-barred, peer-to-peer discussions at key meetings. The questions asked publicly (sometimes mortifyingly) are far more critical than anything you would find published in the literature.
Outside the sessions, overheard snippets of conversations might go something like, “What do you think about Anne’s view on histone methylation?” or “We just can’t get anywhere near the same activation of our LCL lines compared to Alex’s group,” as people swap notes on talks they’ve just heard.
Basically, these major meetings are where people make or break their reputations. As such, they are a key part of the quality-control machinery and establishment of a consensus view of “truth”.
The pitfalls of relying on social structures
Relying on social processes to rapidly sort a field’s results by quality and agree which ones matter comes with built-in costs, both in overheads and risks.
Science is practiced by humans, and it is unsurprising that we, collectively, behave pretty much as average humans. Human foibles are as alive and well in science as they are everywhere else, so there are always random effects that are hard to capture. Unconscious biases, most notably around gender and diversity, are well documented, with ‘up weighting’ and ‘down weighting’ effects running alongside the more impersonal discussions about science.
To our credit, many scientific groups are addressing this directly, first by recognising problems and, unsurprisingly, quantifying them. Proactively addressing them by, for example, actively looking for gender balance in speaker lists, is the crucial second step.
There is still a long way to go, and far more diversity to embrace beyond gender. If innovation is really at the heart of what we do, and the quality of our analyses relies on maximum diversity, we have some serious distance to cover, and fast.
Plus ça change
Over the past 25 years or so, I’ve witnessed and been part of some major scientific moments (and far, far more minor (but still exciting) ones!), and experienced countless moments of revelation, large and small. The idea that science might be unravelling simply does not fit my experience. I’ve not seen any major change in the process, and historical accounts of science show that the culture has been consistently social and critical. The way we consume information may have changed (e.g. tweets v. telegrams), but the fundamentals are the same.
In fact, far from unravelling, science has been reinforced by the establishment of virtually impregnable scientific truths. I’d challenge anyone who might claim, for example, that TP53 is not a key protein monitoring DNA replication, which is often damaged in cancer. Or that C. elegans does not have a fixed cellular development, with some programmed random choices for specific cell fates. Few people are going to dispute that autophagy is part of Crohn’s disease development. Why would they?
The strong weave of so many experiments – sometimes an overwhelming number – in so many laboratories supports these statements as truths. Yet when they were first put forward they were exciting, difficult to process, sometimes radical ideas in their own field.
How we could do better
Creative scientific thinkers are always putting forward exciting, difficult-to-process, radical ideas. Some of them are borne out, and some are just plain wrong. As it turns out, repeats do not make it “impossible” to do whole-genome assemblies of complex eukaryotes. Humans don’t actually have “over 100,000” protein-coding genes.
Fortunately, the wrong turns are harder to remember (though that might be my optimism showing through).
But science could do better at achieving a better sense of solidity. We should always explore the consequences of different accepted processes and practices within sub-disciplines. There are strange backwaters in science, and odd tolerance of practices that run against the overall goal of understanding.
The four grand challenges of data science
To add my own prescription to the record, here are some key positive changes that I would like to see across all scientific disciplines.
1. Open Data and Open Analysis are equally important
Everything works better when scientists can – with little hassle – look at, fool around with and re-use experimental data and analyses from other groups. I am lucky that the scientific field I grew up in – molecular biology – has always had a strong data- and resource-sharing mentality, taken to a pretty high level in genomics. It would feel deeply wrong for me not to make my data (raw and intermediate) available on publication, if not pre-publication.
Without the luxury of open data baked into its culture, any scientific discipline will be bound by contrived, torturous justifications for hoarding their findings. In my experience, 90% of the ‘it won’t work in our field’ attitude has simply been handed throughout the ‘generations’. The remaining 10% is down to field-specific details about the sociology, data models and practical processes for data sharing.
Sharing analysis is just as important as sharing data. Routinely checking analyses and establishing sensible processes around release code are both required to make this work. It’s also really important to spell out the details of analyses very explicitly. Molecular biology and genomics are not exactly exemplars here – more ‘middle of the pack’. We can (and should) do better.
We should all be sharing our code, and making more useful ways to provide complete audit trails for analyses.
2. Embrace pre-prints
ArXiv proved the principle that pre-prints – papers published on the Internet without/before peer review – are good for scientific progress. For various reasons, this success was ignored for too long (more than 20 years) in molecular biology and other health-related fields.
Pre-prints allow new information to be shared faster, bypassing the arduous process of a study being chosen by a journal (or, conversely, authors settling on a journal). Selection, not peer review, gets in the way of communicating findings within sub-disciplines – some important, others minor but still useful.
However, preprints are taking hold via systems such as Bioarxiv, PeerJ and others. It will take a while for the life-science community to adjust to pre-prints, but there are plenty of incentives for it to catch up.
3. Use new platforms to publish better, frictionless, minor ‘papers’
Journal selection takes up too much of our time and angst. Smaller, high-quality publications are emerging on the digital scene since overheads started to come down. PLOS One, Peer J, F1000 Research are just a few examples.
But the social world of the life sciences has not adjusted to this cornucopia of platforms, nor has it embraced the freedom they represent. We don’t take advantage of it to publish small, useful findings that don’t fit into a traditional “article” mode, which has a lot of potential to improve scientific communication.
4. Work the digital furrowed brow
I believe we need a new type of minor publication that flags conflicting findings, for example when one set of experiments fails to replicate a published set of results.
We should not shy away from these, even though it is tempting to avoid the opportunity cost of chasing dead ends. The current burden of publishing a strong negative result that contradicts published work is, weirdly, far higher than publishing a different positive result. Furthermore, all the good hints that can be gained from reading about an unproductive experimental approach are never passed on, rather they are hidden away deep in the archives of sub-disciplines.
We need a more positive, less confrontational way to talk about discrepancies, and to embrace the long-term value of bringing them to people’s attention.
That will take a lot of work on the social side, so that the personal costs are not ruinous. We need all the good scientists we can get, and we should frame our questioning differently. For example, we should move away from casting people who mistake as villains (“Hang your head in shame, for you have broken our trust! Begone!”) and instead allow everyone to learn from them (“It’s easy to make mistakes, and I think you’ve let one slip through here.”).
Running with it
The Internet has transformed science communication, but we are far from seizing all the opportunities it affords. We could be communicating more independently, on a far wider scale (from micro to mega), considering far longer timeframes up front (immediate to archived).
The breadth and reach of the Internet also offer the potential to reduce biases in gender and diversity for scientific processes – but by no means will this happen automatically. It will allow greater integration and transparency between close-knit communities and the wider scientific community, and between people working in science and those who are simply interested in it.
I believe we should embrace a more experimental approach to sharing and discussing the outcomes of life-science experimentation and analysis. We have the tools to reflect more deeply on how our practice is constructed, and to consider whether this practice is helping us achieve the full potential of scientific enquiry.
We should use every tool we have – social or technical – to its fullest extent, always with the aim of developing a deeper and better understanding of the world we live in.