Openness and transparency in science is hailed by policy makers, funders and scientists as a trend that will drive better science, faster advancement and innovation and wider socio-economic benefits. On-going research, aiming to deliver policy recommendations for Open Access to Research Data, reveals that policy on open data may be driven by a strong belief in, rather than evidence of, the above benefits. Furthermore, in the minds of scientists, current policy still lacks a clear route to operationalise open access to publically funded research data.
Reading the current policy literature on the broader topic of on open science will show that opening access to publically funded research data is perceived to have the potential to drive faster progress in science, by minimizing duplication of effort, and offering scientists a wider range of data to use for re-analysis, comparison, integration and testing. Furthermore, there is a belief that open access to publically funded research data will yield economic benefits from re-use, as can be seen by references to data as an untapped resource, currency, and a public good in the sense that its production is funded by public money and thus should be accessible to the general public.
When interviewing practicing scientists within five academic disciplines, for the RECODE project, a more complex picture emerges. Although the scientists are generally very positive about open access to research data, their concerns regarding operationalisation of open research data policies highlight issues which could prove to become significant barriers to implementation.
One key point, which all of the scientists mentioned concerns practices to establish the ‘meaning’ of data, which is necessary for successful re-use and integration. Data without sufficient explanation e.g. metadata, coding, and description of research design and questions, is ill, or not, usable for further research. Opening up access to publicly funded research data is thus significantly more complex than simply placing a spread sheet or a database online.
“people can do limited work with a dataset that is not well documented. I do worry that people will just think “Oh I need to archive my data and then it’s done” but it goes beyond archiving. The key question here is, “is the data set going to be re-useable 40 years down the line, when you are not around anymore?” Just because it is archived, does not mean it is reusable.” (Scientist, Archaeology)
‘The disadvantage, especially with experimental work to put it in such a format that it can be directly and easily used by others, that involves quite a bit of work. That is in my view quite a substantial hurdle. It is one thing to put data in an Open Access data base but it is another thing to put it in such a way that you do not need an extensive explanation to be able to use it.’ (Scientist, Health Research)
In many cases, significant work is needed on data to establish the necessary context. This includes time consuming tasks of establishing the necessary context outlined above. For some types of data, e.g. particle physics experimental data a secondary user would need ‘the reconstruction programs, the simulation and its database, the programs that handle the simulation and (…) access to the physics generators’ (Scientist, Particle Physics). In the scientists’ view, all the extra data work is currently neither funded nor rewarded in terms of academic recognition. As the peer reviewed paper is still the key measure for academic success, the incentive to spend time on writing metadata for secondary users is not present.
With regard to funding for data work, whilst the scientists agree that this would serve as a driver for furthering open research data policies, their concern is that whilst funding for research is overall dwindling an emphasis will be placed on opening up access to all publicly funded research data irrespective of how relevant, or useable it is to the wider population.
‘You might end up wasting millions of pounds and then only 10 people are interested, that is a waste of money and work.’ (Scientist, Particle Physics)
An example of this would be the petabytes of data which are yielded each year from the Large Hadron Collider experiment. The data requires specialist knowledge and equipment to be understood and used. The sheer size of the data also means that it would be an expensive undertaking to provide storage and long term data management.
It is clear that no one open data policy will suit all disciplines, and the key issue for policy makers is to ensure the participation of the research communities in further implementation of open data access so that the data is made open in ways that make it ‘accessible; intelligible; assessable; and usable’. Without these four points, which are highlighted in the Royal Society report, Science as an Open Enterprise, the full benefits of Open Access to Research Data may not be realised.
University of Sheffield is one of the partners of the RECODE project, which is an EU FP7 funded project focused on providing policy recommendations for open access to research data in Europe. See http://recodeproject.eu/.
Thordis Sveinsdottir, Research Associate, Department of Sociological Studies, University of Sheffield. firstname.lastname@example.org.