JISC RDM Training Workshop, 26th October 2012

With thanks to Jacqueline Cooke, Librarian (Acting), Goldsmiths, University of London, for the following blog post. This workshop was held primarily for the new JISCMRD Research Data Management training projects (2012-13), however other JISCMRD projects were invited to attend, and Jacqueline Cooke kindly represented KAPTUR.

The themes of the day were:

  • Librarians’ role in RDM training
  • design of training sessions
  • advocacy
  • components of good research data management
  • options for publishing data

The first presentation was from the Digital Curation Centre’s (DCC) Research Data Management Skills Support Initiative – Assessment, Benchmarking and Classification (DaMSSI-ABC)  project (2012-13). This project has an overarching brief to support and improve coherence in the development, dissemination and reuse of research data management training materials developed by the JISC RDMTrain projects (2010-11). They will also make links with existing initiatives that promote information literacy for researchers, such as the Research Information and Digital Literacies Coalition (RIDLs) and Vitae, referring to the Vitae Information literacy lens (PDF) on the Vitae Researcher Development Framework.

On a practical level they will support classification and deposit of projects’ training materials into JORUM so they are more easily discoverable for reuse through a JORUM ‘lens’. They will also work strategically to:

  • make links with relevant professional bodies
  • develop criteria for ‘peer review’ of training courses
  • add RDM training to the career profile of librarians

Librarians’ role in RDM training

The strand supports the role of librarians in RDM training, as an extension of their information literacy portfolio and building on their professional ability to act as signposters. See also the Reskilling for Research (PDF) report by Mary Auckland for Research Libraries UK.

RDMRose is looking at taught and CPD learning for information professions. Initially they suggest that librarians have the potential to carry out RDM training but will need to extend their professional identity and build on their existing roles and skills. Many lack knowledge of research culture and need to understand this in order to be trusted.

At the University of East London (UEL) the RDM project builds on the Library’s established lead in RDM. They point out that the Library has a reputation for collaborative projects, they’re credible, they have proven expertise in collecting and cataloguing, compliance (copyright, managing the CLA licence), they value sharing, care about impact through citation and run the repository.

Design of training sessions

The SoDaMaT (Sound Data Management Training for electronic music) project, Queen Mary, University of London and UEL talked about the design of training sessions. There was general discussion as many attendees had previous experience. All were cautious about generic workshops, as researchers in different disciplines or departments will work in different contexts and environments and so advised considering who the training is aimed at.

Tips from the experienced ones:

  • keep it short (1 hour optimum)
  • include technical basics, formats, storage, use of folders as well as theory
  • attach it to other training that is seen as essential or valuable
  • don’t call it ‘digital preservation’
  • fit it into existing research skills programmes especially for post-graduate researchers
  • check consistency of advice with other training on RDM e.g. qualitative data training
  • provide online as well as face-to-face sessions and integrate them

Advocacy

Buy in from your institution’s senior management team is essential; they are now more likely to be receptive due to the current high profile of the Finch report, Open Access agendas and the impact on research funding. SoDaMaT suggest that researchers are engaged by evidence. They use dramatic stories of data loss, and point out the IPR consequences of ‘curation in the cloud’. The University of Leicester’s RDM web page is presented as a scenario:

“What would you do if you lost your research data tomorrow? RDM isn’t principally about complying with policy. It means helping you to complete your research, share the research and get credit for what you have done.”

Effective advocacy emphasises the value of RDM to researchers to make the business case for introducing training:

  • saves researchers’ time looking up previous work
  • helps you get funding
  • it is like ethics, doing it well will enable you to do your research better (UEL)
  • sends your research into the future, enables citation of data along with articles

Components of good RDM

Good practice in RDM has usually been boiled down to four steps, variously

SoDaMaT Preserve Document Organise Publish
Incremental Plan Store Explain Share
University of Leicester Create Organise Access Look after
IHR/JISC Start early Explain it Store it safely Share it

Further details from: SoDaMaT’s wiki ‘Online training materials’, University of Leicester’s RDM page, the JISC ‘Incremental’ project page, LSE/Cambridge/IHR/ULCC’s ‘Sending your research material into the future’ project.

Options for publishing data

The trainer needs to ask questions about the data and about working practices and agree a definition of data, because “researchers have many ways to approach RDM on their own terms” (UEL).

  • What data is available? (e.g. in science raw data/usable data/datasets/supporting material/all worked data)
  • Who decides what data to save and give access to? (Referee? RCUK? PI?)
  • Where can data be published? (national data archives/learned societies website/institutional repositories/journals). Not all of these will be available in all disciplines.

There was a discussion of data publication issues, covering:

  • Culture change, how much awareness is there of the issues of RDM?
  • Citation of data supporting published articles works well if publishers hold it, then the data package gets a doi (see Dryad project)
  • Publishers/learned societies say they will do what communities want, therefore there is an opportunity to influence development of other players
  • Publishers should not take IPR of data, advise use of a CC-BY licence if possible
  • EPSRC institutional ‘Roadmap to research data management’ includes training

Methodology for the Environmental Assessment – shared

As discussed during the JISCMRD Programme Launch in Nottingham, projects thought it would be good to share what each one is doing regarding the Data Asset Framework (DAF) and/or gathering user needs, in order to see if it can be used/re-used by the other projects. We have been describing Kaptur’s approach, and the rationale for this approach, in a series of blog posts. Previous blog posts on this topic are available by searching the tag ‘environmental assessment‘, the two most relevant of these are: ‘Methodology for the Environmental Assessment‘; and ‘Environmental Assessment interview questions‘. Feedback is welcomed.

Kaptur is not using DAF, although we have considered what can be learned from the DAF approach. DAF provides institutions with a means to:

“identify, locate, describe and assess how they are managing their research data assets”

DAF Screencast

A fully comprehensive website is available:http://www.data-audit.eu/. This includes the DAF Implementation Guide (PDF).

DAF recommend that you begin by deciding what you mean by ‘data assets’, for example they mention:

“numerical data, statistics, output from experimental equipment, survey results, interview transcripts, databases, images or audiovisual files, amongst other things”

DAF Implementation Guide (PDF) pp.7

Our initial probing interviews and research in the area of visual arts data tell us that we are not ready yet to pin this down to specific assets, although potentially all of the above could be included. One of the issues arising out of the probing interviews was the concept of what ‘research data’ was in the first place. We decided to undertake formal interviews to gather detailed qualitative information that could better inform Kaptur and help to build relationships with visual arts researchers at the four institutions. This approach, whilst not following DAF exactly, did also include questions that enabled information to be gathered about the types of data asset that researchers were producing and how they were being managed.

The scope of the Kaptur Environmental Assessment report has been defined in our methodology, which we make available for use and re-use:

Following the imminent publication of our report, the next stage is to establish working groups in each institution as a way to both continue the dialogue with the visual arts researchers and also to encompass a wider range of stakeholders. We have been looking at the CARDIO assessment tool, particularly as this is designed to “improve communication and understanding” between stakeholders. However this is normally used following a more formal data audit procedure, and therefore we may adapt the approach of CARDIO to suit our timescales and circumstances. For example there is a clear benefit to holding face-to-face meetings with all the stakeholders and this will take priority, however it may be that questions or elements of the CARDIO tool can be used to inform the agenda for these meetings. This is yet to be discussed, and will be raised at the Steering Group meeting on Monday as part of the Implementation Plan.


Kaptur – three months into the project (1/6)

One sixth of the way through Kaptur, and this is our update for the third month:

1. Project Outputs

  • consortium agreement – in process of being signed (delays due to Christmas, this is now expected before the end of January)

2. Environmental Assessment

  • The 16 one-hour recorded interviews have now been transcribed. Each Project Officer has been reviewing the transcripts, marking them up and checking that they are anonymised in order to collaboratively analyse on Monday 9th and Tuesday 10th January 2012.

3. Dissemination

4. Issues/challenges

December is always a challenging month (due to leave and tying up loose ends) which is why we pressed ahead with the project work so quickly during October and November. During December we continued to build links with other projects, the DCC, and internationally at the IDCC conference. The biggest issue was making sure everything was in place for the data analysis to occur in early January including the transcripts and venue. We will be meeting at Goldsmiths, University of London and a blog post will follow here regarding our analysis.


#jiscmrd programme launch – Commonalities

Simon Hodson, JISCMRD Programme Manager, has asked all projects to do a short blog post about commonalities.

View from the National College for School Leadership, Nottingham. Photo: MTG

Kaptur has previously highlighted the commonalities with the first round of JISCMRD programme funding (2009-11) and how we plan to use training materials produced by Project CAiRO and also have spent time looking at JISC Incremental. The commonalities identified so far from the JISCMRD Programme launch are:

1. Disciplinary

The session on the last day put a few of the projects together in an ‘Arts and Humanities’ group. Some of the projects that are particularly relevant to us are:

2. Pilot infrastructure

Kaptur is one of 17 projects in Strand A of the JISCMRD programme (Simon Hodson’s blog post on this) – we are therefore seeking to both learn lessons from more experienced projects in this strand (who had previous JISCMRD funding or links) and also find out how similar pilot projects are approaching things.

3. Approach

  • During the Programme Launch there was a lot of talk about DCC tools including DMP Online, DAF, and CARDIO – look out for a future blog post about our environmental assessment methodology.
  • Also keen to learn lessons from the MaDAM project, which is now MiSS (MaDAM into Sustainable Service) – http://www.miss.manchester.ac.uk/ (great URL!)
  • Research360@Bath looks good too!

Please let me know if I have overlooked any projects that are relevant to Kaptur – we are interested in engaging with other projects and welcome feedback!

Links


The DCC Roadshow in Cambridge, Day One

The following blog post has been written by Tahani Nadim, Kaptur Project Officer, Goldsmiths, University of London.

The sixth DCC Roadshow on data management, organized in conjunction with Cambridge University Library, began with DCC’s own Associate Director, Graham Pryor, highlighting the current big theme summarized by “3 Rs”: re-use, regeneration and repurposing of data. His talk focused on the scale and complexity of data generation in all sciences though, once more, the “hard” sciences received most attention with examples like the Large Hadron Collider (15 petabytes of data annually) and GenBank, the NCBI’s nucleotide sequence database (holding approx.130 billion bases in 140 million sequence records in the traditional GenBank divisions). Nathan Cunningham, of the British Antarctic Survey’s (BAS) Polar Data Centre, gave some very dazzling and dizzying examples of the range and complexity of data produced by the BAS – “data bling” and “Disney science” as he called it. Some of the challenges faced by Cunningham and colleagues relate to turning unstructured into structured data; describing data in such a way as to make it discoverable and useable; and, importantly, finding ways to automate this.

For Cunningham, so-called data “mash-ups” (combining data on e.g. sea surface temperature, feeding routes of penguins, chlorophyll levels or high-resolution sea ice images) provide decision-making tools as well as diagnostic tools. David Shotton, a cell biologist turned bioinformatics guru, made very similar arguments for the biosciences. Introducing a host of data curation projects, particularly focused on digital imaging, Shotton pointed to reasons why many researchers still do not publish their data: information and work overload; pressure for financial viability (to get money for their departments); cognitive overheads and skills barriers. The latter was also very clear from Cuningham’s presentation: data curation requires specialised knowledge of the date-generating discipline and can more than often not be ‘delegated’.

The presentations by Pryor, Cunningham and Shotton left little doubt about the fact that data sets are becoming the new instruments of science and establishing new ways of working (e.g.  collaborative modelling in global virtual laboratory as done in the neurosciences in the CARMEN project) but this poses a number of critical questions for researchers and institutions alike: Who will analyse all this data and how? Is digital data the new special collections? Regarding regulation, Pryor noted that in some cases, for example in the case of European IP laws, regulation actively obstructs data sharing as well as digital preservation. Pryor voiced concerns about the handling of data management requirements amongst research councils’ policies, pointing in particular at the EPSRC’s timescale and vague language.

In terms of providing access to this data, Pryor introduced some commendable initiatives such as the Panton Principles as well as open science applications such as the Citizen Science Alliance. Again, open data throws up a lot of questions: How to be “open” but also how far to go with being “open”? What are the incentives for being “open”? How to handle sensitive data (particularly in the biomedical sciences)? One study on the current handling of research data mentioned by Pryor, the Incremental project, was later described in more detail by Elin Stangeland of University of Cambridge’s DSpace repository. A JISC-funded collaboration between Cambridge and the University of Glasgow the project produced a scoping study before drawing together guidance and support literature, provding training in data curation and creating audiovisual learning resources.

A different perspective was offered by Dr Anne Alexander. Actually, a doubly different perspective since this presentation came from a researcher in the humanities. Alexander’s research focuses on Middle Eastern politics, particularly the labour movements and similar political movements in the region. Her current project, which looks at the Egyptian revolution, demonstrates the dramatic transformation in data resources she engages with. Commencing her presentation with an image of her usual data such as notes, newsletter, newspapers as well as analogue tapes, the remaining part of her talk is accompanied by Facebook pages, Twitter feeds, YouTube videos and other social media platforms. Alexander argued that the political landscape has radically taken in the novel spaces offered by social media: the strike committee of sugar refinery workers in Egypt, the strike committee of doctors in Egypt as well as the ruling military council have Facebook pages which are actively enrolled in their respective political practices.

The problems faced by the researcher are plentiful: How to capture (save, store, make discoverable etc) not just the discrete data entity (the tweet, the video, the picture, the status update, etc.) but the context, that is, the comments, the other “recommended” or “related” content and other dynamically created relations and objects. Another issue pertains to the difference between public and published: pulling comments made by activists against authorities out of the digital realm (e.g. a Facebook wall) and committing them to paper and/or circulating them by other means and routes poses serious ethical questions. Equally confounding is the problem of “ownership” raised in the discussion: If everything is owned by Facebook – what is a researcher to do?

In conclusion, Alexander suggested that it is not helpful to think of the Internet as an infinite archive. This gives us a false sense of security. Instead, researchers need to acquire archival skills.