These are tumultuous times for those of us invested in improving science through transparency. Last week’s episode featured, on the one hand, a number of high-ranking medical journal editors come forward with a joint proposal on data-sharing standards for their journals. A few days later, an editorial in the New England Journal of Medicine condemned the “aerial view” of data-sharing as a public good, claiming that it would lead to “parasitic” behavior by scientists who re-use the data for their own research, to conduct poor-quality meta-analysis or to check the original researchers’ work. The stronger push for and against opening up scientific practice through data-sharing is also happening in the social sciences. Late last year, hundreds of political scientists opposed a move on the part of top political science journals to adopting new data-sharing policies.
Those who worry about data-sharing raise some legitimate points. For instance: how do we ensure participant confidentiality? Who will pay to build and sustain the infrastructure and often considerable effort needed to share the data so that others can interpret and use them? And how will researchers be rewarded for sharing their data, which publications are the coin of the academic realm?
But let’s be clear. As difficult as it is to change norms and incentives in a vast system, we can’t afford not to. We have a created a system which rewards exciting publications, leaving null results in the file drawer and skewing the results. We leave plenty of room to dig for p-values in the data in many fields, leaving a big open question about which papers are reporting statistical noise. We have a poor track record of trying to repeat studies, and when we do, we see a low ability to reproduce results.
How can data-sharing help? To be honest, it very obviously can’t help with all these problems. No amount of data-sharing of published studies will bring the buried studies out of their file drawers, or automatically curtail the failure to account for multiple hypothesis testing. But it can help with some things: it allows others to check results to an extent by running the statistical code on the shared data to check whether they arrive at the same results as in a publication (as we do in our code checks at Innovations for Poverty Action, along with sharing data from studies). It can also help allow for robustness checks to see how fragile the evidence is to minor changes in the way the analysis is conducted. While the debates about the strength of evidence can be difficult, they are essential – we should be debating, when the claims are far from obviously true, as is the case with most of scientific research.
And more than that: data-sharing is a movement in the direction of opening science up. We shouldn’t have to trust someone’s say-so in their published paper. We should see what they did, and how, and be able to delve in deeply to examine their work. Those who want to verify important scientific results are not parasitic. They are recognizing that we are all fallible, that science is difficult, and that the more we can do to improve our evidence, the better off we will all be.