Publishing Big Data Science

This is the third and final post in a series in which I share some lessons learned about how to plan, manage, analyse and deliver a ‘big biodata’ project successfully.

Now that you have the results of your carefully planned, meticulously managed and diligently analysed experiment, it’s time to decide on what to publish, and where.

1. Present your work

I love presenting, because having to explain my work to a mixed audience helps me understand and articulate the science better, and to convey the excitement of discovery. What is the work for, it not the joy of
exploration? Creating figures to use in a presentation is enjoyable, and helps me get my thoughts in order.

I find writing paper less enjoyable than presentations, but the same core is present in both – good figures which provide a strong narrative from design through to analysis. There is however a particular rigour in writing a paper that brings out the best in a piece of scientific work. Present, and publish – it’s important to us all.

2. Organise your material

Most of these papers comprise both a main paper and a supplement. The main paper will feature the figures that tell the story: experimental design, discovery, main findings, interesting cases. It should be written for the interested reader who will mainly trust you on the experimental and analysis details.The supplement is for the reader (including a reviewer or two) who does not trust you. Sometimes, on other people’s papers, you will be that reader. The supplement should have the same flow, but have all the supporting details that tell that reader the data and analysis are kosher.

3. Figures first

Make good figures that illustrate your point, and test them out in presentations, first to the group, then to colleagues in your institute, and then more widely. You’ll fine-tune the figures as you go. Your presentation will need quite a bit scaffolding (why the question is interesting, about your experimental design, key statistics), but don’t be afraid to show sample data from your results to show your motivation. Consider showing a boring and interesting case side by side. You may find this scaffolding can be condensed into your Figure 1 for the paper. You can show other figures in the supplement if they support your work.

4. Put pen to paper

Once your figures flow, you can write the results. You can also start working on the supplement, following the same general flow. All the ‘data is good’ plots will go in the supplement, as it can have extra “lemmas” about the data. Don’t skimp in the supplement – include technical details supporting things like, why your normalisation is sensible, or better than other approaches. If the supplement gets big, provide an index on the supplement for navigation. The
sceptical reader will like to see this.

5. Focus on the results

Write the introduction and discussion you are happy with your results write-up. Think about the readers and the reviewers, and make sure to cite widely. If you are coming into a new arena with this high-throughput approach, lavish praise on the importance of the field and the massive amount of individual loci work on which you are building. Basically, if you are publishing a large-scale approach in an area that hasn’t had one, avoid being seen as an interloper; read the papers, cite them – and you are likely to find a couple of new angles on your work through this process.

6. Length angst

If you are aiming for a journal with strict length limits (and I do wonder why we tolerate this in this day and age), don’t let that hold you back at the submission phase. Write as much as you need to, and acknowledge the length in your cover letter. Emphasise that you want the reviewers to have a full understanding of the science. For these more restricted space papers, reviewing at that density is often really hard – the text can be edited after review.

7. Be open

It is pretty standard that you will be publishing eventually open access (certainly if you are NIH, or Wellcome Trust and other funders). It is easier to do this via journals which automatically handle the open access submission (Plos, Genome Biology, BMC series and many others, sometime with open access fees). Due to the funder mandates pretty much every journal will at least allow submission of your author manuscript to PubMedCentral, but doing it yourself is quite annoying.

There are new experiments in open publishing as well to look at. Two examples are F1000 and Bioarxiv. In F1000 the whole process of submission, peer review and publication is done in the open – it interesting to watch open peer review in action. Bioarxiv is following the more physics pre-print server, and many journals allow pre-print posting whilst a paper is under review. This is a cool way to stop being scooped and provides a way to get community input (“informal peer review”). I think we’re in an experimentation phase of this next stage in open science, and it’s going to be interesting to see where we end up.

8. Tidy up and submit your data

Make sure you have all the raw data to submit, with the meta-data nicely tidied up (ideally, your LIMS system will have this ready to go by default). Submit your structured data (DNA, Proteomics, Metabolomics, X-ray structure, EM) to the appropriate archive (EMBL-EBI has the full range). Have a directory that you keep in house; otherwise, put all the intermediate datasets and files on the web. This is good for transparency – the sceptical reader will be even more reassured when he or she knows that they can (if they want) not only get the raw data (a given for molecular biology) but can also come into the analysis half-way through. About half of these readers could be future members of a group you may ask to “follow the analysis in paper A”, or to confirm that “XXX did this in paper B”. Do this for your own group’s sanity and for extra brownie points from readers around the world.

3 Replies to “Publishing Big Data Science”

  1. I am studying social science and plan to use data research techniques to prove some of my hypothesis on the nature of human cooperation in different environments.
    I heard that data science has an awesome application in biology and genetics. Seems that is true enough. Have one good assistance tool for data research, look for link text here. Maybe you will find it useful.

  2. To practice Data Science, in the true meaning of the term, you need the analytical know-how of math and statistics, the coding skills necessary to work with data, and an area of subject-matter expertise. Without subject-matter expertise, you might as well call yourself a mathematician or a statistician. Similarly, a software programmer without subject-matter expertise and analytical know-how might better be considered a software engineer or developer, but not a data scientist.

Comments are closed.