Last week I attended a conference to discuss research data services. The focus was on the UKRDS project, one of a number of feasibility studies funded by HEFCE. The existing research data is something of an untapped resource; it is hardly reused by those that create it, let alone by others. And the volume of data (and hence the size of the untapped resource) will grow substantially over the next few years.
The UKRDS project was looking to address a number of issues the biggest of which are probably cultural rather than technical. We heard that researchers are just not used to putting the data they generate into the public domain – it is often poorly documented and lacks quality metadata to allow it to be found and used. Also the Australian National Data Service identified that the cost of contributing data to their data service outweighed the benefits gained from publishing the data. One school of thought was that including citings of data in research metrics would encourage more researchers to deposit data. Another was that more ‘stick’ was required, to make the requirement for data to be published part of the funding conditions and/or Government policy. What was clear was that researchers would need to be trained in how to deliver well documented data that will be of wider use, and that the mechanism for producing the data needs to be low cost. Researchers will also need training in data mining techniques to ensure that they are able to take advantage of the research data held.
There will be technical issues. Tools need to be developed to allow easy access and ideally there needs to be a standardised approach as far as the different disciplines will allow. Resourcing is also an issue, not just in identifying capital funding to deliver a strong pilot but also to deliver a sustainable, scalable solution. The proposed Pathfinder project where a number of institutions will look to build a pilot RDS should clarify some issues and identify a way forward. Regardless of the future direction, a research data service will need a robust infrastructure and ongoing resourcing.