Minting DOIs for research data in the UK

‘Coin press at New Orleans Mint Museum’ 
AttributionNo Derivative Works Some rights reserved by Ted Drake

Last week’s DataCite workshop was a really good opportunity to ask questions about DataCite at The British Library, how to mint a DOI (Digital Object Identifier), and to discuss challenges with citing research data.

Data Citation

The day started with a challenge to the presenters – what is data? This discussion had echoes of KAPTUR’s own research question – what is visual arts research data? (Environmental Assessment report). It seems almost impossible to define research data due to its diversity, but a working definition is obviously necessary, a good example is from University of Bristol’s Glossary.

The British Library’s Head of Scientific, Technical & Medical Information, Lee-Ann Coleman, spoke about the importance of making research data available, mentioning examples including the virologist Ilaria Capua who opened up worldwide access to Avian flu virus data sequences; and the open-data journal GigaScience research into E.Coli. A recent addition, ISO 26324:2012 for DOIs was mentioned. Garfield’s 15 reasons ‘when/why to cite?’ was a useful point of reference too:

  1. Paying homage to pioneers.
  2. Giving credit for related work (homage to peers).
  3. Identifying methodology, equipment etc.
  4. Providing background reading.
  5. Correcting one’s own work.
  6. Correcting the work of others.
  7. Criticizing previous work.
  8. Substantiating claims.
  9. Alerting researchers to forthcoming work.
  10. Providing leads to poorly disseminated, poorly indexed, or uncited work.
  11. Authenticating data and classes of fact – physical constants, etc.
  12. Identifying original publications in which an idea or concept was discussed.
  13. Identifying the original publication describing an eponymic [sic] concept or term as, e.g., Hodgkin’s disease, Pareto’s Law, Friedel-Crafts Reaction, etc.
  14. Disclaiming work or ideas of others (negative claims).
  15. Disputing priority claims of others (negative homage).

Garfield, E., 1996. When to Cite. In: Library Quarterly 66 (4), 449-458. Available from: http://www.garfield.library.upenn.edu/papers/libquart66(4)p449y1996.pdf [Accessed 25 May 2012].

What is DataCite? 

Elizabeth Newbold provided an introduction to DataCite. It is a not-for-profit international registration agency for DOIs to facilitate the citing of research data. Founded in December 2009; it consists of a Managing Agent (currently the German National Library of Science and Technology (TIB)) and regional Members. In the UK The British Library is the regional Member, which then works with ‘Data Clients’ such as the UK Data Archive amongst other data centres and repositories. DOIs are assigned between the Data Member (e.g. The British Library) and their Data Clients (e.g. UK Data Archive) i.e. on an institution to institution basis – if an individual researcher wants a DOI then they need to contact the appropriate Data Client for their subject discipline, a list of some existing and potential future Data Clients is maintained on the DataCite website. Data Clients must fulfil a number of requirements and pay an annual fee to The British Library.

Some of the requirements for Data Clients:

  • DOIs must resolve to a publically accessible landing page even if the data itself is not open; the landing page can be an existing set of Web pages with the Data Client’s style so long as it is updated to include the DataCite information.
  • Mandatory metadata fields: 4 fields (5 if you include the DOI itself) – these should be subject discipline agnostic: http://schema.datacite.org/
  • The mandatory metadata must be freely available for discovery purposes, specifically under a Creative Commons CC0 licence; there was some interesting discussion around this and some issues to be resolved.
  • Data Clients should have a formal data preservation plan (this may include disposal policies and so on); an operational service level agreement (SLA); and a clear intention in a mission statement to preserve and maintain the DOIs, this could include reference to an EPSRC Roadmap. Action: DataCite will share a draft SLA with the attendees.

How to mint a DOI – case study

Louise Corti of the UK Data Archive provided a very useful mini-case study and I’ll link to her presentation here when it is available. As data providers the UK Data Archive want to use citations to improve resource access and discovery. It was really interesting to hear how DOIs are effected by changes to the research data – at the UK Data Archive minor changes (e.g. a spelling mistake or typo) are documented in their Change Log but the DOI version number stays the same; major changes (such as an updated dataset) are documented in the Change Log field and the DOI is also given a new version number at the end. Challenges for the future include citing parts or fragments of research data; and also issues around describing relationships between data. Look out for a forthcoming UK Data Archive and ESRC brochure on citing data, aimed at the Social Science community.

How to mint a DOI – the technical bit

An illuminating presentation from Ed Zukowski described the following components of the DataCite systems:

The Data Client will be provided with information from the regional Member in order to make use of the Metadata Store and facility to mint DOIs, technical knowledge is required to use the API for bulk registration. For minting one DOI an XML file is required with at least the four mandatory fields of metadata using the DataCite Schema.

The user will resolve a DOI (e.g. using a system such as http://dx.doi.org/) through the Global Handle Registry this includes information from the Handle Server hosted by the DataCite Managing Agent. Resolving a DOI takes the user to a landing page and collects statistics about how many times a DOI has been resolved.

There is a free search of existing DataCite DOIs. From the top right of the Search page select ‘Options’ and ‘enable’ the Filter Preview, then when you do a search it is possible to filter by individual regional Member (‘allocator’) and Data Client (‘datacentre’).

The OAI-PMH Data Provider  is available here: http://oai.datacite.org/

http://data.datacite.org/ – provides two ways of exposing metadata held in the Metadata Store:

  1. HTML links i.e. hyperlinks in a standard Web browser.
  2. HTTP Content Negotiation – ‘I say what I want and in what priority’ e.g. ‘I want a PDF version of the research data but if there is a HTML version I’ll take it’ – if there is a PDF version available content negotiation will take you straight to the PDF rather than to the landing page for example.

Contact datasets@bl.uk to ask for access to the test site which enables you to mint ‘temporary’ DOIs. See also: https://github.com/datacite

A really useful tool to format DOIs into Harvard system citations (and other citation systems) in multiple languages: http://crosscite.org/citeproc/

Breakout groups on challenges with citing research data (some questions):

  • Selection process – what about raw data? when does data become citable?
  • Why not use DOIs for Ph.D. theses?
  • Do you need to mint DOIs before you publish the journal article so you can link to them? – could start minting DOIs at collection level then move into additional specific parts nearer to publication of the journal article?
  • A need to define roles and responsibilities.
  • What about changes to Data Clients or funding bodies?
  • How does versioning work with DOIs? (note UK Data Archive case study above)
  • What is a citable unit of research data?
  • What about cross-institutional, international, or cross-disciplinary research? Who mints the DOIs?
  • A need for DataCite to provide case studies, perhaps with future workshops.
  • It is only possible to describe one resource type per DOI (and this is a fixed controlled list e.g. Image, Film, etc) – this may be problematic with visual arts e.g. an exhibition; how do you describe complex relationships?

For cost/charge plans – discuss with the UK regional Member via datasets@bl.uk

The next DataCite workshop will be on metadata on Friday 6th July, details will be published online in due course.

Some other links: