Managing and citing sensitive data

With thanks to Anne Spalding, Kaptur Project Officer, University for the Creative Arts, for the following account of DataCite’s Managing sensitive data workshop, The British Library, London, 29th October 2012.

On Monday 29th October I attended my first DataCite workshop; this particular workshop is the third in a series. Slides from this and previous workshops are available via The British Library Datasets web pages.

During the morning session there were four presentations followed after lunch by a workshop where four groups focussed on data management scenarios. Feedback from the workshops and a general discussion rounded off the day.

The first speaker, Veerle Van den Eynden spoke about managing sensitive data from the UK Data Archive‘s experience. She explained in broad terms the legal aspects and also the role that research ethics, data archives and repositories play in the management of research data.

Jonathan Tedds from the BRISSkit project spoke of managing medical and personal data. As part of the project a survey of 3000 staff was conducted in 2010 regarding their own use and re-use of research data. In due course a summary of their findings will be available as part of the project outcomes. Jonathan emphasised the need to make the process of depositing data more engaging for researchers. Jonathan mentioned work in managing research data undertaken by the University of Virginia Library.

From UKOLN, Cathy Pink gave a very interesting presentation on working with commercial partners as part of the Research 360 project. One focus of the project is on the issues and challenges that arise from private sector partnerships and research collaborations. Cathy illustrated the different collaboration agreements that are in place at Bath University. Another important aspect of citing and discovering research data is the use of metadata and Cathy cited the work of Sally Rumsey ‘Just Enough Metadata’.

The final presentation was given by Brian Mathews of the Science and Technology Facilities Council (STFC). Brian’s talk focussed on some issues in research ethics arising from data sharing and also that we are working in a political environment. He referred to the Opportunities for Data Exchange (ODE) and a paper entitled ‘Ten Tales of Drivers and Barriers in Data Sharing’.

One of the main discussion points emerging from the workshops and feedback was the use of Digital Object Identifiers (DOIs). A particular issue was with assigning a DOI to a single object which could change over time and how to note this, is another DOI required? Could an umbrella DOI be assigned for the whole object but somehow allow for changes? Solutions for handling this might depend on work practices within institutions.

This event provided me with a further insight into the complexities of managing research data. The variety of perspectives also demonstrated that we are all grappling with the same issues but might well take different solutions dependant on the institutional environment.


KAPTUR one year on – (2/3)

This is our update for the end of the twelfth month of KAPTUR; we are just past the two-thirds mark! For an overview of the past year, please visit the KAPTUR Prezi.

WP1: Project Management

  • The Project team have been in contact by telephone and email; four colleagues will be attending the JISCMRD Programme meeting this week in Nottingham.

WP3: Technical Infrastructure

WP4: Modelling

  • The four policies are going through several rounds of committees and are to schedule; this has been the focus of the past month.
  • In addition the University of the Arts London’s draft policy is available online: http://www.arts.ac.uk/research/data-management/

WP5: Training and Support

  • The KAPTUR training plan is now publicly available.
  • The Pinterest links have been linked to via UAL’s RDM pages and DCC’s Marieke Guy’s excellent blog post on The value of video in getting the RDM message across
  • The GSA Project Officer taught MRes students about research terminology covering research data and promoting the KAPTUR project; this will feed into our training materials. Blog post about this: Getting to grips with research terminology
  • The Project Officers have been in contact with their Research Offices to arrange a half-day training session for Research Office staff and Librarians in order to pilot the KAPTUR training materials.

WP6: Evaluation and Sustainability

  • The Project Officers have received a short Word document and model costings template (Excel) and will be piloting this within their own institutions.
  • Detailed case study templates have been created and shared with the Project Officers. The case studies will be presented at the end-of-project conference on Wednesday 6th March 2013.

WP7: Dissemination


Minting DOIs for research data in the UK

‘Coin press at New Orleans Mint Museum’ 
AttributionNo Derivative Works Some rights reserved by Ted Drake

Last week’s DataCite workshop was a really good opportunity to ask questions about DataCite at The British Library, how to mint a DOI (Digital Object Identifier), and to discuss challenges with citing research data.

Data Citation

The day started with a challenge to the presenters – what is data? This discussion had echoes of KAPTUR’s own research question – what is visual arts research data? (Environmental Assessment report). It seems almost impossible to define research data due to its diversity, but a working definition is obviously necessary, a good example is from University of Bristol’s Glossary.

The British Library’s Head of Scientific, Technical & Medical Information, Lee-Ann Coleman, spoke about the importance of making research data available, mentioning examples including the virologist Ilaria Capua who opened up worldwide access to Avian flu virus data sequences; and the open-data journal GigaScience research into E.Coli. A recent addition, ISO 26324:2012 for DOIs was mentioned. Garfield’s 15 reasons ‘when/why to cite?’ was a useful point of reference too:

  1. Paying homage to pioneers.
  2. Giving credit for related work (homage to peers).
  3. Identifying methodology, equipment etc.
  4. Providing background reading.
  5. Correcting one’s own work.
  6. Correcting the work of others.
  7. Criticizing previous work.
  8. Substantiating claims.
  9. Alerting researchers to forthcoming work.
  10. Providing leads to poorly disseminated, poorly indexed, or uncited work.
  11. Authenticating data and classes of fact – physical constants, etc.
  12. Identifying original publications in which an idea or concept was discussed.
  13. Identifying the original publication describing an eponymic [sic] concept or term as, e.g., Hodgkin’s disease, Pareto’s Law, Friedel-Crafts Reaction, etc.
  14. Disclaiming work or ideas of others (negative claims).
  15. Disputing priority claims of others (negative homage).

Garfield, E., 1996. When to Cite. In: Library Quarterly 66 (4), 449-458. Available from: http://www.garfield.library.upenn.edu/papers/libquart66(4)p449y1996.pdf [Accessed 25 May 2012].

What is DataCite? 

Elizabeth Newbold provided an introduction to DataCite. It is a not-for-profit international registration agency for DOIs to facilitate the citing of research data. Founded in December 2009; it consists of a Managing Agent (currently the German National Library of Science and Technology (TIB)) and regional Members. In the UK The British Library is the regional Member, which then works with ‘Data Clients’ such as the UK Data Archive amongst other data centres and repositories. DOIs are assigned between the Data Member (e.g. The British Library) and their Data Clients (e.g. UK Data Archive) i.e. on an institution to institution basis – if an individual researcher wants a DOI then they need to contact the appropriate Data Client for their subject discipline, a list of some existing and potential future Data Clients is maintained on the DataCite website. Data Clients must fulfil a number of requirements and pay an annual fee to The British Library.

Some of the requirements for Data Clients:

  • DOIs must resolve to a publically accessible landing page even if the data itself is not open; the landing page can be an existing set of Web pages with the Data Client’s style so long as it is updated to include the DataCite information.
  • Mandatory metadata fields: 4 fields (5 if you include the DOI itself) – these should be subject discipline agnostic: http://schema.datacite.org/
  • The mandatory metadata must be freely available for discovery purposes, specifically under a Creative Commons CC0 licence; there was some interesting discussion around this and some issues to be resolved.
  • Data Clients should have a formal data preservation plan (this may include disposal policies and so on); an operational service level agreement (SLA); and a clear intention in a mission statement to preserve and maintain the DOIs, this could include reference to an EPSRC Roadmap. Action: DataCite will share a draft SLA with the attendees.

How to mint a DOI – case study

Louise Corti of the UK Data Archive provided a very useful mini-case study and I’ll link to her presentation here when it is available. As data providers the UK Data Archive want to use citations to improve resource access and discovery. It was really interesting to hear how DOIs are effected by changes to the research data – at the UK Data Archive minor changes (e.g. a spelling mistake or typo) are documented in their Change Log but the DOI version number stays the same; major changes (such as an updated dataset) are documented in the Change Log field and the DOI is also given a new version number at the end. Challenges for the future include citing parts or fragments of research data; and also issues around describing relationships between data. Look out for a forthcoming UK Data Archive and ESRC brochure on citing data, aimed at the Social Science community.

How to mint a DOI – the technical bit

An illuminating presentation from Ed Zukowski described the following components of the DataCite systems:

The Data Client will be provided with information from the regional Member in order to make use of the Metadata Store and facility to mint DOIs, technical knowledge is required to use the API for bulk registration. For minting one DOI an XML file is required with at least the four mandatory fields of metadata using the DataCite Schema.

The user will resolve a DOI (e.g. using a system such as http://dx.doi.org/) through the Global Handle Registry this includes information from the Handle Server hosted by the DataCite Managing Agent. Resolving a DOI takes the user to a landing page and collects statistics about how many times a DOI has been resolved.

There is a free search of existing DataCite DOIs. From the top right of the Search page select ‘Options’ and ‘enable’ the Filter Preview, then when you do a search it is possible to filter by individual regional Member (‘allocator’) and Data Client (‘datacentre’).

The OAI-PMH Data Provider  is available here: http://oai.datacite.org/

http://data.datacite.org/ – provides two ways of exposing metadata held in the Metadata Store:

  1. HTML links i.e. hyperlinks in a standard Web browser.
  2. HTTP Content Negotiation – ‘I say what I want and in what priority’ e.g. ‘I want a PDF version of the research data but if there is a HTML version I’ll take it’ – if there is a PDF version available content negotiation will take you straight to the PDF rather than to the landing page for example.

Contact datasets@bl.uk to ask for access to the test site which enables you to mint ‘temporary’ DOIs. See also: https://github.com/datacite

A really useful tool to format DOIs into Harvard system citations (and other citation systems) in multiple languages: http://crosscite.org/citeproc/

Breakout groups on challenges with citing research data (some questions):

  • Selection process – what about raw data? when does data become citable?
  • Why not use DOIs for Ph.D. theses?
  • Do you need to mint DOIs before you publish the journal article so you can link to them? – could start minting DOIs at collection level then move into additional specific parts nearer to publication of the journal article?
  • A need to define roles and responsibilities.
  • What about changes to Data Clients or funding bodies?
  • How does versioning work with DOIs? (note UK Data Archive case study above)
  • What is a citable unit of research data?
  • What about cross-institutional, international, or cross-disciplinary research? Who mints the DOIs?
  • A need for DataCite to provide case studies, perhaps with future workshops.
  • It is only possible to describe one resource type per DOI (and this is a fixed controlled list e.g. Image, Film, etc) – this may be problematic with visual arts e.g. an exhibition; how do you describe complex relationships?

For cost/charge plans – discuss with the UK regional Member via datasets@bl.uk

The next DataCite workshop will be on metadata on Friday 6th July, details will be published online in due course.

Some other links:


Kaptur – six months into the project (1/3)

One third of the way through the project, and this is our update for the end of the sixth month:

WP1: Project Management

WP3: Technical Infrastructure

  • The Technical Analysis report has been through several iterations; the user requirement component has been sent to the partner institutions for final feedback; once this is received the requirements testing will take place leading to the choice of technical system for the pilot.

WP4: Modelling

  • The Project Officers reported on the trends in funding at their institutions (blog post)
  • Three of the four Project Officers attended the JISCMRD two-day workshop on institutional RDM policies (12-13th March, Leeds); this was extremely beneficial for Kaptur for several reasons:
    1. using the Chatham House Rule the JISCMRD projects could talk openly and plainly about the reality of creating and seeking approval for institutional RDM policies
    2. we had an opportunity to really understand the processes and workflows from more experienced projects (i.e. those who had received funding in the previous JISCMRD round 2009-11 or who already had institutional RDM policies)
    3. it was very interesting to hear how other JISCMRD projects were making use of the CARDIO and DAF tools from the Digital Curation Centre – we will be discussing this at our next project team meeting in April
    4. there was also the opportunity to ask questions of select representatives of the Research Councils UK (RCUK) which was very illuminating, particularly in terms of the EPSRC Expectations
    5. as most of the project team were able to attend we could discuss and share our own views over the course of the two days and come to a consensus of opinion – i.e. that we were aiming for a high-level aspirational policy based on University of Edinburgh’s policy
  • An RDM Discussion paper was drafted and was an agenda item at the UCA Research and Enterprise Committee meeting on 30th March; this Committee also have the role to approve an institutional RDM policy.
  • Representatives from 2 of the partner institutions attended the JISCMRD Data Management Planning (DMP) end of project event (23rd March) – this was useful in terms of discussion throughout the day, lessons learned from other projects, and also take-home resources which we may be able to implement – as well as a sneak peek at the new and improved version of the DCC’s DMP Online tool due to launch soon.

WP7: Dissemination

  • As mentioned above, 3/4 institutions attended the JISCMRD policies workshop and 2/4 attended the DMP end of project workshop (both March 2012).
  • Promotion of the Environmental Assessment report (blog post)
  • Beginning of an idea for more creative publicity material for Kaptur, to be followed up at our next project team meeting
  • The Project Manager gave a presentation on Kaptur to British Library staff as part of their Digital Conversations event (blog post)
  • The Project Director and Project Manager co-authored a written paper on Kaptur for the EVA London 2012 conference

4. Issues/challenges

As we are now a third of the way through the project it is a good point for reflection on both the work already accomplished as well as the work still to be done. Our focus continues to be on producing a pilot model for the visual arts sector and drawing on the strength of the collaboration across four partner institutions. Added to this is a growing sense of community across the JISCMRD programme (2011-13) which has benefited the Kaptur project team.


Kaptur at The British Library

British-Library-by-stevecadman

Gateway detail, The British Library (1978-97)
by Colin St John Wilson.
Photo: Steve Cadman License: CC BY-SA 2.0

The second (official) Digital Conversations @ British Library took place on Friday 30th March, hosted by the Digital Research and Curator Team (more information in a staff newsletter available via ISSUU). The theme for the event was ‘Annotation and Sharing’. It was a privilege to attend this internal staff event, and also to have an opportunity to present Kaptur, with a focus at this stage in the project on sharing (the Prezi is available here: http://prezi.com/0m_ql5don6vy/kaptur-bl/).

Brief notes about the other presentations are below:

Jan Reichelt, president and co-founder of Mendeley – “a free reference manager and academic social network” – spoke about some of the current features (e.g. annotating PDFs) and possible future developments e.g. Kleenk – a visual map of connections between your paper and other papers, described as “the first semantic network of scientific content” it has integration with Mendeley through its API. It was also interesting to hear that Mendeley’s recommended article feature has around an 80% success rate with users (based on stats from the last year).

Richard Ranft, Head of Sound & Vision at The British Library, spoke about some innovative BL Sound projects:

The JISC funded eMargin project was presented by Andrew Kehoe and Matt Gee. It’s a great tool for “underlining and colour-coded highlighting […] notes and comments” on a range of text file formats and sharing these across groups; it has features which are not currently available in other similar tools and the potential to develop further. The Birmingham School of Acting are currently using a specially developed version for iPad to annotate their scripts during rehearsals. The University of Leicester will be using the tool with their first year students from September. It is available here: http://emargin.bcu.ac.uk/

Debbie Harrison, Honorary Research Fellow, Birkbeck, University of London, spoke about the fascinating international collaborative David Livingstone Spectral Imaging Project, in particular focusing on the publication of Livingstone’s 1871 Field Diary: A Multispectral Critical Edition. The electronic publication enables researchers to compare the original diary (including pages written across 19th century newspapers) with later published versions.

Sean Martin, Head of Architecture & Development at the British Library spoke about the  International Image Interoperability Framework (IIIF); a project funded by the Andrew W. Mellon Foundation to “collaboratively produce an interoperable framework for image delivery” and thereby address the issue of digital “image-based resources […] locked up in silos, with access restricted to bespoke, locally built applications”. Previous Mellon funded projects that have led to this latest development include:

  • Shared Canvas – “enables the construction of views by distributed collaborators, by annotating a shared “Canvas” resource which is then rendered using a presentation system”
  • Open Annotation Collaboration– “development of a shared annotation data model supportive of interoperable annotations”
  • Digital Medieval Manuscript Initiatives – enabling interoperable environments for digital medieval manuscripts