A corner-stone of science is that claims are based on evidence, and that evidence should be available for others to check. Yet it is surprisingly difficult, at least in some fields of science, to obtain the original data behind the claims made in a publication.
For example, in 2004 Wicherts et al [1] asked the authors of 141 papers published in some premier psychology journals for the data behind their claims. 73% of the authors did not provide that data, with responses ranging from refusing to share their data (35%), promising but then not delivering (20%), and not replying despite repeated emails (14%). This is despite the authors having agreed to the journals’ requirements to “not withhold the data on which their conclusions are based from other competent professionals who seek to verify the substantive claims through reanalysis and who intend to use such data only for that purpose” (page 396 of [2]). While exceptions are allowed that might violate participants’ confidentiality or commercial sensitivity, such exceptions were unlikely to apply for these papers.
Changing behaviour
There are several factors that might contribute to scientists’ reluctance to share their data. Sometimes they want to do additional analyses of the same data in future, and don’t want to be “scooped” (for a recent example of this in neuroscience, see [3]). This doesn’t wash with me: if we cannot ask for data in order to reproduce a published claim, we are undermining that corner-stone of science. More often, the barrier is the effort needed to organise data in a format suitable for sharing with others, and scientists argue they are too busy. I think this just requires a change in scientific behaviour and training, so that proper documentation occurs during the acquisition of data, with a mind to future sharing, rather than in a panic after publication. This behaviour is becoming increasing important as journals require evidence of data sharing, sometimes even on submission of a paper. Combining this with proper Data Management Plans (as required by most funders) solves the common problem of the lead author on a project moving to a new position, and the remaining authors being unable to interpret what is left behind.
How we now share data at the CBU
In 2017, while I was Acting Director of the MRC Cognition & Brain Sciences Unit (CBU) in Cambridge, our Open Science Committee enacted a new institutional policy about data sharing: Any publication involving a scientist at the CBU now has to be associated with a data record that is logged in a repository. This record includes a description of the location of the raw data and instructions (often computer code) of how to reproduce the results in the paper. Then when anyone wants the data associated with a CBU publication, they can request it via our (searchable) data repository. At the moment, these requests are sent to the main CBU author for confirmation, but most cases they can be handled by our Information Officer (i.e, a “managed-access” system).
While I’m sure this new policy was a painful for many CBU scientists, the majority seem to accept that this policy not only benefits science (ensuring the CBU would not fall foul of a meta-analysis like Wicherts et al), but also benefits their own research, when they revisit/replicate past data in the future (e.g, after a post-doc has left). Similar procedures may already be in place at some institutions; but I suspect that in many other cases such data curation is left to the lead or senior author, and some of these may need a “stick” to encourage this behaviour (rather than just more carrots); the institution may be right level to provide such a stick.
Informed consent for participants
Other issues arise when sharing data from human volunteers. The datasets in our repository are anonymised, however, it remains possible that individuals could be identified by “triangulation” from a number of variables in a dataset (e.g, a female of 98 years of age with 12 years of education living in a certain town). This could then be used to recover new information, such as that person’s score on IQ tests. We try to prevent this by requiring people who request data from our repository to agree not to try to identify individuals (via a web-form that we log). More importantly, we explain this risk in the information sheets we give participants, before they consent to the study.
Indeed, it is important to consider the consent given by participants. Our standard consent forms used to say that data will only be shared with members of the research team, and possibly other researchers under a collaboration agreement. In 2017, we changed the relevant consent box to: “I agree that my anonymised research data from this study will be kept in the long-term, may be combined with data from other CBU studies to answer new research questions, may be shared with other researchers or may be made ‘Open’ without new consent being sought from me.” As of 2020, not one participant has refused to endorse this statement.
Going further on data sharing
Looking forward, I would hope that we can make data truly “open access”, in the sense of being directly downloadable (after clicking on some basic agreements), without requiring approval by the original researcher. Looking forward further, I would hope many scientists consider sharing their data immediately after collection, i.e., before publication. Too often data are “sat on” for many years while a scientist hopes to get round to analysing them, and in many cases the data never see the light of day [4]. While this prevents the scientist from being “scooped”, it is also potentially detrimental to the progress of science. This hoarding behaviour seems particularly egregious when the scientist has been funded by public (e.g, tax-payer) money. In fact, I suspect that the fear of being scooped is often a “paper tiger” [5], and few scientists would use data without crediting the original collectors, or perhaps suggesting a collaboration. The advent of “data papers”, in which data are described without any analysis or interpretation [e.g, 6,7], provides the opportunity of formally crediting the data collectors, while also publishing data as soon as possible. In short, the sharing of data is a vital ingredient of the BNA’s drive for credible neuroscience.