KAPTUR Steering Group meeting, HEFCE, 18th July

View from HEFCE, 12th Floor,
Centre Point, London. Photo: MTG

Key points from the meeting:

  • It was noted that there was diversity among the four institutions in terms of drafting the RDM policies – we can still collaborate and learn from each other – but the approach is necessarily different at each institution.
  • University of the Arts London are really benefiting from their participation in the DCC University Engagement programme; the UAL Project Officer is working an extra day per week on this and as a result has been able to revisit and extend the KAPTUR Environmental Assessment through 20 x 5 minute telephone calls which will be followed up with 1 hour in-depth interviews with visual arts researchers.
  • There was discussion about a definition for visual arts research data and how this might be constraining, but was needed at the same time in order to be able to move forward with the RDM policies. A working definition was presented to the KAPTUR Steering Group 3 months ago in response to questions raised by the UAL working group: http://www.slideshare.net/kaptur_mrd/kaptur-news06
  • Feedback on training/support and the KAPTUR toolkits: recommendation to create KAPTUR videos about visual arts research data instead of hosting workshops at each institution (we already had plans to re-use content from the previous JISCMRD programme e.g. http://www.youtube.com/user/GUdatamanagement). I still think the face-to-face aspect of the workshops would be useful, but maybe there is a way to incorporate shorter sessions and use the videos as part of these? We will discuss at our next project team meeting in September.
  • The Steering Group liked the Figshare interface and thought it would be appealing to visual arts researchers as well as easy to use; there were lots of questions about both DataStage and Figshare.
  • Feedback on Sustainability: recommendation to get an idea of costs of the proposed technical infrastructure to include estimates of staff time required for ongoing support of the systems.

The presentations are available from SlideShare.

The Triumphal Quadriga or Horses of St Mark, facade of St Mark’s Basilica in Venice.
Creative Commons Attribution 3.0 Unported license

It was great to welcome Laura Molloy, Researcher at the Humanities Advanced Technology and Information Institute (HATII), to the Steering Group meeting. After the meeting Leigh, Laura and I met to discuss the project from the perspective of her role as JISCMRD Evidence Gatherer. As well as discussing impact and gathering evidence about benefits, Laura also came up with the concept of the chariot (KAPTUR project) being pulled by four horses (our four institutions). I really liked this idea of the race and also the need for collaboration to be well-matched in order to make the project successful.


#jiscmrd – Kaptur’s post on benefits and metrics #KRDS

Simon Hodson, JISCMRD Programme Manager, has asked all 18 month JISCMRD projects to write a blog post about the key expected benefits that each project will achieve, and what metrics we will use to evidence these at the end of the project.

Links

Following a presentation by Neil Beagrie, Director of consultancy at Charles Beagrie, the JISCMRD projects were provided with a ‘Summary of Benefits Identified by the RDMI Projects’ and a ‘Summary of Metrics Identified by the RDMI Projects’. We were invited to select three benefits and then match these up with the appropriate metrics, making sure to include both quantitative and qualitative metrics for each benefit. I would like to emphasise that the following has not been discussed within the project team yet and is subject to confirmation.

Benefits

  1. Sustainability of research data infrastructure.
  2. Change to user practices.
  3. Mitigating organisational risks.

Metrics

  1. By each institution creating and approving its own Business Costs and Sustainability plans; the ultimate proof is in the longevity of the research data infrastructure. Qualitative data will be gathered through the Steering Group meetings which will include high-level senior staff across the four institutions. Quantitative data may include percentage or estimated cost savings/efficiencies for central services and/or departments.
  2. By taking a snapshot of existing practice at the four institutions through the Environmental Assessment report and then through maintaining user engagement throughout the project and taking snapshots at key stages to monitor progress. Qualitative data will be gathered through the interviews undertaken as part of the Environmental Assessment report, and through ongoing engagement e.g. through working groups and/or focus groups. Quantitative data will be gathered in the following ways: online questionnaires and/or feedback forms to record the impact on working practice of the project, these would be undertaken at key points e.g. we are planning an online survey in January, and would also gather feedback after training events; if online training materials are created then usage statistics will be gathered.
  3. By taking a snapshot of existing practice at the four institutions through the Environmental Assessment report and then through maintaining user engagement throughout the project and taking snapshots at key stages to monitor progress. Qualitative data will include a range of stakeholder examples of improved risk management e.g. organisational practice before Kaptur and how this has changed during/afterwards. Quantitative data may include a percentage improvement in routine back-up of data, and/or a percentage improvement in research data management awareness and policies/systems.

Neil emphasised that when using the KRDS tools it is a good idea to do initial work by one individual and then work this up in a project team context. Therefore the points raised in this blog post will be discussed at our next meeting in early January and possibly again at the Steering Group meeting. Neil also mentioned that it was important to adapt the tools to your project needs. From reading the documentation I am also aware of the need to start with the benefits framework tool prior to moving on to the value chain and benefits impact tool. The work done by other projects giving example worksheets is really useful, in particular the UK Data Archive and the Archaeology Data Service. Reference: Report and Presentations from the JISC Digital Curation/Preservation Benefits Tools Project Dissemination Workshop


Day 2, Seventh RDMF, University of Warwick

Scarman House, University of Warwick. Image used with permission of Warwick Conferences.

The setting for RDMF7 was a great venue, good food, and good company; it worked well with having attendees limited to 50 to encourage discussion. Day 2 below is detailed in terms of useful links and key points.

Impact through data management: Where are the wins? What are the pitfalls? – Cameron Neylon, Senior Scientist, Science & Technology Facilities Council

Cameron spoke about motivation from his own perspective as a researcher; there is motivation to spend time to do a publication, but not necessarily to prepare a dataset. The motivation for good data management practice might be that it then makes things easily available when I need to write a paper.

A dataset needs to be clearly associated with the research questions, and needs to record how it can be discovered/re-used.

At the moment there is a focus on data, but there are other aspects we need to consider as well, including process, software and materials.

The funders’ role as motivators. A myth or achievable reality? – Ben Ryan, Senior Evaluation Manager, EPSRC

EPSRC Timetable for building RDM capacity: by May 2012, organizations should have a roadmap in place, should be compliant by May 2015

Data sharing agreements can cover copyright and other issues, but enough data should be out there to describe the data and limitations and how to access it i.e. if there are limitations.

EPRSC sets a deadline of 12 months after data creation; they then expect access to both physical and digital data for a minimum term of 10 years from last date on which access to the data was requested by a third party or from the date that any researcher’s ‘privileged access’ period expires.

Institutional measures to encourage and facilitate effective data management and sharing. A matter of cash, careers or cultural change? Miggie Pickton, Research Support Librarian, University of Northampton

When Miggie began the DAF investigation, little was known centrally at the University about data storage policy or procedure. They already had NECTAR, an EPrints repository, in place to store and preserve digital data; and a future development of the RDM work may be to create another EPrints repository to store research data then having some link-up between the research outputs and research data as appropriate.

They have established a Research Data Working Group comprising of the University Records Manager, a representative researcher, the Head of Research and Enterprise, and Miggie.
One of the issues with managing research data was selection and disposal – some researchers were reluctant to set a disposal date – ‘after I die’.
The Northampton approach was about encouragement rather than mandate.
Simplified internal procedures have been setup to monitor whether policy and procedures are being followed.

RDM is now a standard part of research student inductions. They also focus on dissemination via multiple communication channels e.g. school research forums, university website, one-to-ones; they involve records managers, library staff and researchers in development of training sessions and guidelines; gain support from opinion leaders (as well as senior managers) raising awareness amongst academic and support staff. They demonstrate the link between good RDM and career progression e.g. through increased data citation.

Where next? – Disseminate new policy to all schools and divisions; develop RDM training programme; and provide a storage facility.

Benefits analysis: challenges and opportunities. Neil Beagrie, Director of Consultancy, Charles Beagrie Ltd.

Neil clarified what he meant by ‘long term benefits’: benefits in the near term could be up to 5 years; in the long term, more than 5 years.

It should be noted that not all data downloads relate to data use; for example a teacher may download once, add it to their Virtual Learning Environment and through this make it available to 50 students who each download it, but the user stats just record the one download. There are also other examples of users downloading an item once, but then making intensive use of it for three years. Other users are browsing and downloading but then rejecting or not using it – a discard is not necessarily a negative – as this can be part of the learning process. Another question would be over a download or an ‘access’, for example someone doing lots of queries rather than actually doing a download – this shows a complexity of use.

Break-out sessions

Group 1: The drive for effective research data management – is it much ado about nothing?

The group decided straight away that it was not much ado about nothing; the case was how to motivate people to do RDM.
Some of the positive things suggested were: to embed principles and practices early on so it became part of the research lifecycle; to appeal to peoples’ self interest; to have good stories; to suggest practical tips; to make sure roles and responsibilities are clearly defined as ownership is needed at lower levels as well as at senior management level; and have infrastructure in place to make sure it is possible.
The group also considered the costs of data management, and whether there was a role for funders to make a positive example of institutions who are meeting their requirements already.

Group 2: What really are the sticks and carrots that will make a long-term difference to the pursuit of structured data management processes?

Ref: Paul Stainthorp’s blog post includes Group 2 feedback.

Group 3: Who pays and who reaps the benefit? The incentives for funders, institutions and researchers for investment in research data management.

  1. Research Funders to make funding available for research based solely on existing datasets – gives incentive to researchers.
  2. DCC to include costing module in to the DMP Online tool to allow sensible estimates of cost.
  3. Research Funders to be explicit that RDM costs should be included in applications.
  4. Publishers to review peer-review process to include validation of data (some comments about the peer-review process being slow enough as it is without including datasets as well; some disciplines already have measures in place to peer-review datasets).
  5. Research Institutes to provide training and support to researchers from early stages, in order to encourage best practice.
  6. Professional bodies to promote good RDM through policy and training.
  7. DCC to collect examples of the costs of bad RDM: institution’s/researcher’s reputation; financial costs; even the loss of lives (example given of a cancer patient study). Liz Lyon suggested that this could be a piece of work titled ‘Reputation and Risk’.

Successes in sharing: obtaining data from more than 1,000 sources worldwide – Catherine Moyes, Malaria Atlas Project Manager, University of Oxford

Malaria Atlas Project: http://www.map.ox.ac.uk/

The MAP team collate data and generate geo-spatial models; they are working across five continents with very large datasets. They work with four different data types: mosquito occurrence data points, genetic data points, parasite prevalence, and case incidence data points. One survey equals one data point; the data points are geo-positioned which adds value to the data.

Catherine showed the following table of ‘carrots versus sticks’:
carrots – sticks
direct funding – calling upon the journal
applications for funding – calling upon the original funder
co-authorship – calling upon the institution
citation –

Incentives you could offer to researchers include: to pay them money for data cleaning (rather than for the data itself); to offer services such as application writing, or providing a letter of support; to offer to write a joint paper; and to cite the researcher’s data.

The MAP project has not utilised any of the above incentives as it would have been impossible with obtaining data from over 1000 sources. Rather than providing incentives it is also possible to remove disincentives which MAP does by: explaining clearly and succinctly who they are and what they are doing; being precise about exactly what data they are requesting; if a request is linked to a paper, they read the paper carefully before making the request; MAP staff are complementary and diplomatic; they are persistent, not to the extent of ‘badgering’, though they will politely ask again and again; they will take on any work required to sort the data out (and don’t expect people to do anything more than email the data to MAP); they provide requested undertakings about use of the data; publically acknowledge all data providers online; and data request are made by a senior team member e.g. a professor who writes to them as a peer – writing in a complimentary manner. It also helps that the project began in 2005 and has also built up a good reputation. The 1000 sources worldwide include a wide variety of groups: ministry of health, non-governmental organisation, public health reports, journal articles, and academic researchers – but they have used the same approach with all types of groups. A permanent feature of their website is the ‘acknowledgements’ section which includes the name of all contributors – and also a section on acknowledgements within each individual dataset record as well.

Catherine posed the question: ‘why share raw data?’ as well as the usual answers she also mentioned the book: SHARING PUBLICATION-RELATED DATA AND MATERIALS.

There were some interesting slides about the requests they have handled to access their data. Their approach is now when MAP publish a paper then they also release the relevant data alongside this. One example of sensitive data that they can’t release is from Myanmar, Burma – as this would put lives at risk by including the location information of people. Out of all the people they asked permission to release the data only 2 people didn’t want to be cited; one contact was happy for their data to be released but they didn’t want to be contacted about it; and only one person said ‘no’.

There are no registration or access agreements as they want to encourage use and therefore get rid of barriers.

They have a PostgreSQL database behind the scenes, data then downloads as Comma Separated Values (CSV). A composite citation is included in the resulting spreadsheet with up to three different citations that may relate to one row of data. It should be noted that the project does not ask people to cite ‘MAP’.

data: no terms and conditions apply
software: github – GNU public licence (open source) and free
probability distributions: email them and they’ll send a DVD as the file is too big
GIS surfaces: creative commons licence and free
estimates of burden/populations at risk: no terms and conditions apply
map images: creative commons and licence and free

A member of the audience asked Catherine about sustainability. She mentioned that the data collection hasn’t stopped and that they are hoping the release of the data may encourage this even more, but ultimately it depends on how long people are still interested in the data and downloading it.

The Institutional Data Management Blueprint and incentivisation – Jeremy Frey, Professor of Physical Chemistry, University of Southampton

IDMB aims and objectives: “to create a practical and attainable institutional framework for managing research data that facilitates ambitious national and international e-research practice”; and “to produce a framework for managing research data that encompasses a whole institution (exemplified by the University of Southampton)”.

Frey presented some of the IDMB findings (PDF) for example in answer to a survey question: ‘Who do you believe owns your research data? Only 25% of respondents thought this was the ‘School/University’, Frey believes that most of the respondents weren’t aware as the answer should have been the institution in most cases.

The University of Southampton’s data policy introduces no new legal or other principles; it is mainly about just applying existing policies for physical objects to the digital objects that have been generated.

Almost 2/3rds of respondents answered that they were responsible for managing their data; most of this was stored on a ‘CD, DVD, USB, or external hard disk’; a significant number of people didn’t know how much data they had. They reported that ‘reusing their own data was relatively easy’ – Frey disagrees, maybe this was easy compared to accessing other researchers’ data. One participant mentioned visiting museums to refresh their memory of objects with varying degrees of success in locating them.

Frey suggested using the terms ‘context’ and ‘process’ rather than ‘metadata’.

There was a useful slide on data management costs.

During the Forum the group also considered research data as an object in its own right that may or may not be submitted to the REF 2014:

“In addition to printed academic work, research outputs may include, but are not limited to: new
materials, devices, images, artefacts, products and buildings; confidential or technical reports; intellectual property, whether in patents or other forms; performances, exhibits or events; work published in non-print media. An underpinning principle of the REF is that all forms of research output will be assessed on a fair and equal basis. Subpanels will not regard any particular form of output as of greater or lesser quality than another per se.”

REF 02.2011 Assessment framework and guidance on submissions, July 2011 (PDF) (106. on p.22)