With thanks to Jacqueline Cooke, Librarian (Acting), Goldsmiths, University of London, for the following blog post. This workshop was held primarily for the new JISCMRD Research Data Management training projects (2012-13), however other JISCMRD projects were invited to attend, and Jacqueline Cooke kindly represented KAPTUR.
The themes of the day were:
- Librarians’ role in RDM training
- design of training sessions
- components of good research data management
- options for publishing data
The first presentation was from the Digital Curation Centre’s (DCC) Research Data Management Skills Support Initiative – Assessment, Benchmarking and Classification (DaMSSI-ABC) project (2012-13). This project has an overarching brief to support and improve coherence in the development, dissemination and reuse of research data management training materials developed by the JISC RDMTrain projects (2010-11). They will also make links with existing initiatives that promote information literacy for researchers, such as the Research Information and Digital Literacies Coalition (RIDLs) and Vitae, referring to the Vitae Information literacy lens (PDF) on the Vitae Researcher Development Framework.
On a practical level they will support classification and deposit of projects’ training materials into JORUM so they are more easily discoverable for reuse through a JORUM ‘lens’. They will also work strategically to:
- make links with relevant professional bodies
- develop criteria for ‘peer review’ of training courses
- add RDM training to the career profile of librarians
Librarians’ role in RDM training
The strand supports the role of librarians in RDM training, as an extension of their information literacy portfolio and building on their professional ability to act as signposters. See also the Reskilling for Research (PDF) report by Mary Auckland for Research Libraries UK.
RDMRose is looking at taught and CPD learning for information professions. Initially they suggest that librarians have the potential to carry out RDM training but will need to extend their professional identity and build on their existing roles and skills. Many lack knowledge of research culture and need to understand this in order to be trusted.
At the University of East London (UEL) the RDM project builds on the Library’s established lead in RDM. They point out that the Library has a reputation for collaborative projects, they’re credible, they have proven expertise in collecting and cataloguing, compliance (copyright, managing the CLA licence), they value sharing, care about impact through citation and run the repository.
Design of training sessions
The SoDaMaT (Sound Data Management Training for electronic music) project, Queen Mary, University of London and UEL talked about the design of training sessions. There was general discussion as many attendees had previous experience. All were cautious about generic workshops, as researchers in different disciplines or departments will work in different contexts and environments and so advised considering who the training is aimed at.
Tips from the experienced ones:
- keep it short (1 hour optimum)
- include technical basics, formats, storage, use of folders as well as theory
- attach it to other training that is seen as essential or valuable
- don’t call it ‘digital preservation’
- fit it into existing research skills programmes especially for post-graduate researchers
- check consistency of advice with other training on RDM e.g. qualitative data training
- provide online as well as face-to-face sessions and integrate them
Buy in from your institution’s senior management team is essential; they are now more likely to be receptive due to the current high profile of the Finch report, Open Access agendas and the impact on research funding. SoDaMaT suggest that researchers are engaged by evidence. They use dramatic stories of data loss, and point out the IPR consequences of ‘curation in the cloud’. The University of Leicester’s RDM web page is presented as a scenario:
“What would you do if you lost your research data tomorrow? RDM isn’t principally about complying with policy. It means helping you to complete your research, share the research and get credit for what you have done.”
Effective advocacy emphasises the value of RDM to researchers to make the business case for introducing training:
- saves researchers’ time looking up previous work
- helps you get funding
- it is like ethics, doing it well will enable you to do your research better (UEL)
- sends your research into the future, enables citation of data along with articles
Components of good RDM
Good practice in RDM has usually been boiled down to four steps, variously
|University of Leicester||Create||Organise||Access||Look after|
|IHR/JISC||Start early||Explain it||Store it safely||Share it|
Further details from: SoDaMaT’s wiki ‘Online training materials’, University of Leicester’s RDM page, the JISC ‘Incremental’ project page, LSE/Cambridge/IHR/ULCC’s ‘Sending your research material into the future’ project.
Options for publishing data
The trainer needs to ask questions about the data and about working practices and agree a definition of data, because “researchers have many ways to approach RDM on their own terms” (UEL).
- What data is available? (e.g. in science raw data/usable data/datasets/supporting material/all worked data)
- Who decides what data to save and give access to? (Referee? RCUK? PI?)
- Where can data be published? (national data archives/learned societies website/institutional repositories/journals). Not all of these will be available in all disciplines.
There was a discussion of data publication issues, covering:
- Culture change, how much awareness is there of the issues of RDM?
- Citation of data supporting published articles works well if publishers hold it, then the data package gets a doi (see Dryad project)
- Publishers/learned societies say they will do what communities want, therefore there is an opportunity to influence development of other players
- Publishers should not take IPR of data, advise use of a CC-BY licence if possible
- EPSRC institutional ‘Roadmap to research data management’ includes training
As discussed during the JISCMRD Programme Launch in Nottingham, projects thought it would be good to share what each one is doing regarding the Data Asset Framework (DAF) and/or gathering user needs, in order to see if it can be used/re-used by the other projects. We have been describing Kaptur’s approach, and the rationale for this approach, in a series of blog posts. Previous blog posts on this topic are available by searching the tag ‘environmental assessment‘, the two most relevant of these are: ‘Methodology for the Environmental Assessment‘; and ‘Environmental Assessment interview questions‘. Feedback is welcomed.
Kaptur is not using DAF, although we have considered what can be learned from the DAF approach. DAF provides institutions with a means to:
“identify, locate, describe and assess how they are managing their research data assets”
DAF recommend that you begin by deciding what you mean by ‘data assets’, for example they mention:
“numerical data, statistics, output from experimental equipment, survey results, interview transcripts, databases, images or audiovisual files, amongst other things”
Our initial probing interviews and research in the area of visual arts data tell us that we are not ready yet to pin this down to specific assets, although potentially all of the above could be included. One of the issues arising out of the probing interviews was the concept of what ‘research data’ was in the first place. We decided to undertake formal interviews to gather detailed qualitative information that could better inform Kaptur and help to build relationships with visual arts researchers at the four institutions. This approach, whilst not following DAF exactly, did also include questions that enabled information to be gathered about the types of data asset that researchers were producing and how they were being managed.
The scope of the Kaptur Environmental Assessment report has been defined in our methodology, which we make available for use and re-use:
Following the imminent publication of our report, the next stage is to establish working groups in each institution as a way to both continue the dialogue with the visual arts researchers and also to encompass a wider range of stakeholders. We have been looking at the CARDIO assessment tool, particularly as this is designed to “improve communication and understanding” between stakeholders. However this is normally used following a more formal data audit procedure, and therefore we may adapt the approach of CARDIO to suit our timescales and circumstances. For example there is a clear benefit to holding face-to-face meetings with all the stakeholders and this will take priority, however it may be that questions or elements of the CARDIO tool can be used to inform the agenda for these meetings. This is yet to be discussed, and will be raised at the Steering Group meeting on Monday as part of the Implementation Plan.
One sixth of the way through Kaptur, and this is our update for the third month:
1. Project Outputs
- consortium agreement – in process of being signed (delays due to Christmas, this is now expected before the end of January)
2. Environmental Assessment
- The 16 one-hour recorded interviews have now been transcribed. Each Project Officer has been reviewing the transcripts, marking them up and checking that they are anonymised in order to collaboratively analyse on Monday 9th and Tuesday 10th January 2012.
- Robin Burgess is working on the new EPrints institutional repository for the Glasgow School of Art (presentation to be given at JISC RTE event in February). This is very useful for Kaptur both in terms of the relationships that Robin has already been building and as it will mean that all four institutional partners will then have ‘kulturised’ EPrints repositories for research outputs. This may have an impact on decisions to be made by the Technical Manager regarding the research data management system.
- John Murtagh wrote a blog post for DCC, this is available here: An arts perspective: day two and three – the sixth DCC Roadshow on data management and also attended Drawing: Interpretation / Translation at University of the Arts London
- Tahani Nadim began work on Defiant Objects which “will produce a guide and recommendations for supporting non-standard deposits (“defiant objects”) in institutional repositories”, this is also one to watch for Kaptur as we are already looking at areas of potential overlap between ‘research data’ and ‘research objects’ in the creative arts
- Anne Spalding attended the IDCC Workshops and wrote blog posts on delivering post-graduate research data management training and data re-use (how can metadata stimulate re-use?). A meeting was held with the UCA Research Office, and the Project CAiRO training module was promoted.
- Project Manager attended the 7th International Digital Curation Conference and gave a ‘minute madness’ presentation on Kaptur, blog post available about the poster.
- The Kultivate project held its final workshop on Linked Data, which included a short presentation on Kaptur. A Storify of the event is available.
December is always a challenging month (due to leave and tying up loose ends) which is why we pressed ahead with the project work so quickly during October and November. During December we continued to build links with other projects, the DCC, and internationally at the IDCC conference. The biggest issue was making sure everything was in place for the data analysis to occur in early January including the transcripts and venue. We will be meeting at Goldsmiths, University of London and a blog post will follow here regarding our analysis.
Simon Hodson, JISCMRD Programme Manager, has asked all projects to do a short blog post about commonalities.
Kaptur has previously highlighted the commonalities with the first round of JISCMRD programme funding (2009-11) and how we plan to use training materials produced by Project CAiRO and also have spent time looking at JISC Incremental. The commonalities identified so far from the JISCMRD Programme launch are:
The session on the last day put a few of the projects together in an ‘Arts and Humanities’ group. Some of the projects that are particularly relevant to us are:
- data.bris – Simon Price, Project Manager and Stephen Gray, Digital Projects Support Officer (formerly Project CAiRO Project Manager)
- Sustainable Management of Digital Music Research Data – similar approach to us with their interviews
- iridium – covering a range of subject areas including the arts
- REWARD – a six month project at UCL (Archaeology)
- Research Data @Essex
2. Pilot infrastructure
Kaptur is one of 17 projects in Strand A of the JISCMRD programme (Simon Hodson’s blog post on this) – we are therefore seeking to both learn lessons from more experienced projects in this strand (who had previous JISCMRD funding or links) and also find out how similar pilot projects are approaching things.
- Managing Research Data: a pilot study in Health and Life Sciences – a pilot project at University of the West of England (UWE are also a member of the Kultur II Group and have a ‘kulturised’ EPrints research repository!)
- DataFlow – keeping an eye on Data Management Rollout at Oxford (DaMaRO) – although DataFlow is in the ‘Cloud’ there does also appear to be an option to have a local Web server.
- SWORD-ARM – SWORD & Archaeological Research data Management
- During the Programme Launch there was a lot of talk about DCC tools including DMP Online, DAF, and CARDIO – look out for a future blog post about our environmental assessment methodology.
- Also keen to learn lessons from the MaDAM project, which is now MiSS (MaDAM into Sustainable Service) – http://www.miss.manchester.ac.uk/ (great URL!)
- Research360@Bath looks good too!
Please let me know if I have overlooked any projects that are relevant to Kaptur – we are interested in engaging with other projects and welcome feedback!
- JISCMRD02-Commonalities Google spreadsheet
- Google Reader “Research data management” bundle created by Jez Cope
The following blog post has been written by Tahani Nadim, Kaptur Project Officer, Goldsmiths, University of London.
The sixth DCC Roadshow on data management, organized in conjunction with Cambridge University Library, began with DCC’s own Associate Director, Graham Pryor, highlighting the current big theme summarized by “3 Rs”: re-use, regeneration and repurposing of data. His talk focused on the scale and complexity of data generation in all sciences though, once more, the “hard” sciences received most attention with examples like the Large Hadron Collider (15 petabytes of data annually) and GenBank, the NCBI’s nucleotide sequence database (holding approx.130 billion bases in 140 million sequence records in the traditional GenBank divisions). Nathan Cunningham, of the British Antarctic Survey’s (BAS) Polar Data Centre, gave some very dazzling and dizzying examples of the range and complexity of data produced by the BAS – “data bling” and “Disney science” as he called it. Some of the challenges faced by Cunningham and colleagues relate to turning unstructured into structured data; describing data in such a way as to make it discoverable and useable; and, importantly, finding ways to automate this.
For Cunningham, so-called data “mash-ups” (combining data on e.g. sea surface temperature, feeding routes of penguins, chlorophyll levels or high-resolution sea ice images) provide decision-making tools as well as diagnostic tools. David Shotton, a cell biologist turned bioinformatics guru, made very similar arguments for the biosciences. Introducing a host of data curation projects, particularly focused on digital imaging, Shotton pointed to reasons why many researchers still do not publish their data: information and work overload; pressure for financial viability (to get money for their departments); cognitive overheads and skills barriers. The latter was also very clear from Cuningham’s presentation: data curation requires specialised knowledge of the date-generating discipline and can more than often not be ‘delegated’.
The presentations by Pryor, Cunningham and Shotton left little doubt about the fact that data sets are becoming the new instruments of science and establishing new ways of working (e.g. collaborative modelling in global virtual laboratory as done in the neurosciences in the CARMEN project) but this poses a number of critical questions for researchers and institutions alike: Who will analyse all this data and how? Is digital data the new special collections? Regarding regulation, Pryor noted that in some cases, for example in the case of European IP laws, regulation actively obstructs data sharing as well as digital preservation. Pryor voiced concerns about the handling of data management requirements amongst research councils’ policies, pointing in particular at the EPSRC’s timescale and vague language.
In terms of providing access to this data, Pryor introduced some commendable initiatives such as the Panton Principles as well as open science applications such as the Citizen Science Alliance. Again, open data throws up a lot of questions: How to be “open” but also how far to go with being “open”? What are the incentives for being “open”? How to handle sensitive data (particularly in the biomedical sciences)? One study on the current handling of research data mentioned by Pryor, the Incremental project, was later described in more detail by Elin Stangeland of University of Cambridge’s DSpace repository. A JISC-funded collaboration between Cambridge and the University of Glasgow the project produced a scoping study before drawing together guidance and support literature, provding training in data curation and creating audiovisual learning resources.
A different perspective was offered by Dr Anne Alexander. Actually, a doubly different perspective since this presentation came from a researcher in the humanities. Alexander’s research focuses on Middle Eastern politics, particularly the labour movements and similar political movements in the region. Her current project, which looks at the Egyptian revolution, demonstrates the dramatic transformation in data resources she engages with. Commencing her presentation with an image of her usual data such as notes, newsletter, newspapers as well as analogue tapes, the remaining part of her talk is accompanied by Facebook pages, Twitter feeds, YouTube videos and other social media platforms. Alexander argued that the political landscape has radically taken in the novel spaces offered by social media: the strike committee of sugar refinery workers in Egypt, the strike committee of doctors in Egypt as well as the ruling military council have Facebook pages which are actively enrolled in their respective political practices.
The problems faced by the researcher are plentiful: How to capture (save, store, make discoverable etc) not just the discrete data entity (the tweet, the video, the picture, the status update, etc.) but the context, that is, the comments, the other “recommended” or “related” content and other dynamically created relations and objects. Another issue pertains to the difference between public and published: pulling comments made by activists against authorities out of the digital realm (e.g. a Facebook wall) and committing them to paper and/or circulating them by other means and routes poses serious ethical questions. Equally confounding is the problem of “ownership” raised in the discussion: If everything is owned by Facebook – what is a researcher to do?
In conclusion, Alexander suggested that it is not helpful to think of the Internet as an infinite archive. This gives us a false sense of security. Instead, researchers need to acquire archival skills.
This is a brief post on our second meeting which occurred at the Royal Festival Hall, London on 31st October. A more detailed post will follow soon with a progress report on Kaptur’s first month.
All four Kaptur Project Officers were able to meet in London, and each had carried out two informal probing interviews with visual arts researchers. There is also a report available from Goldsmiths, University of London on their findings here: Goldsmiths Probing Interviews (SlideShare.net)
As each Project Officer reported back on their findings to the group, the Project Manager wrote key phrases and points on post-it notes. This was done to both record the data and to enable selection. Out of the resulting discussion each Project Officer chose two issues or themes which they thought particularly relevant, these were then reflected upon again in the group and refined. Finally this led to the drafting of the interview questions and methodology (including consent form for interviewee participants etc), which will be made available soon. There will also be some questions that can cross-over with results from the JISC Incremental interviews, and this may be a useful future comparison.
Yesterday members of the KAPTUR project team met at the British Library to discuss the methodology for the environmental assessment. Following previous email discussions and telephone chats by the end of the meeting we had agreed on the following approach:
- There were elements of the JISC Incremental project’s scoping study (PDF) that we could re-use e.g. their approach with having semi-structured interviews, and the cross-departmental comparison of data gathering across the institutional partners.
- However there were also elements of the study that may not be appropriate e.g. the interview questions. As part of our problem space we are trying to uncover ‘What is arts research data?’ and ‘what are the issues?’ and we do not want to be too proscriptive at least initially.
- Each Project Officer will now organise informal interviews with visual arts researchers to gather initial views and issues that will inform our themes for the main data gathering exercise.
- We have arranged to meet again on Monday 31st October when the Project Officers will present their findings to the group.
A note: the role of the environmental assessment in the KAPTUR project is to underpin both the modelling and technical stages which are the main body of the work KAPTUR is undertaking. It is hoped that in carrying out the environmental assessment the four Project Officers will build up relationships that will feed and sustain the work of the project after the end of the project; and that the themes and issues that are raised can be addressed during KAPTUR.