Select one of the following thumbnails to view a close-up and slideshow:
With thanks to Carlos Silva, KAPTUR Technical Manager, for the following blog post.
On 18th February I attended a workshop led by the JISC funded Orbital project, to gather information about the open source software CKAN and how it could be used to support research data management in the academic sector.
The workshop started with a presentation from Mark Wainwright (community co-ordinator for the Open Knowledge Foundation) on the latest release of CKAN, its origins and potential in the academic community.
One of the big advantages with using CKAN is that the ‘core’ system is surrounded by APIs allowing it to be flexible enough to accommodate different user and institutional needs. This means that the core software can be updated without affecting the APIs or having to adapt external code to fit with the core software.
Another important feature that looks promising is the ability of CKAN to not only harvest other CKAN databases, but also to search other types of repositories such as EPrints and DSpace. The mechanism developed covers different repository sources not only EPrints and DSpace, but also Geospatial Servers, Web catalogues and other HTML index pages.
In terms of sustainability, CKAN has been developed over the last 6 years, so it is relatively mature now with an extensive and very streamlined workflow process to add features, fix bugs and enhance the core services. The latest version 2.0 (recently released as Beta) promises to be an exciting release with more visually enhanced tools, improved groups feature, customisable metadata and a rich search experience based on their Apache Solr search.
The workshop continued with a presentation from the data.bris project at the University of Bristol. It is amazing to note that each Principle Investigator can apply for up to 5TB of storage for free and backed up securely for 20 years!
Academics receive a mapped network drive which they can access and use to deposit content, however this requires additional features to manage research data. Therefore, the data.bris project was interested in CKAN due to its flexibility, data access (ability to have private datasets), organisation schema, ability to share with external researchers and the CKAN search engine.
In the future, the University of Bristol is considering two instances of CKAN, one for a public read-only catalogue of research data publications and another for controlled access (which would include teaching and other types of data).
The third presentation was from Orbital; Project Manager Joss Winn provided a virtual tour of the latest tools developed by the project. They have connected CKAN between different instances: to their EPrints repository and also to different departmental databases, such as an awards management system.
The Orbital set up allows their researchers to have different types of data located in a central place, this includes the policies, profiles, publications and analytics information from specific outputs, making the most of the CKAN software.
The demonstration included mention of the software created to enable deposit of data from CKAN to their EPrints repository – something which we have been anticipating for the last few months and is an exciting development for the sector. Orbital have released the code through Github which in theory should work with CKAN version 1.7. The functionality enables CKAN to submit the metadata to EPrints using the SWORD2 protocol but not the actual files themselves – instead a link is added to EPrints which links back to the files deposited in CKAN.
The Orbital team are proposing a two year roadmap to their senior management team to take responsibility and carry this project forward and embed it further into the University of Lincoln’s infrastructure.
During the group discussion session, workshop participants suggested a comprehensive list of about 80 tools, features, amendments and requests that we would like to see as part of a new version of CKAN (a Google Docs spreadsheet is available: http://lncn.eu/mxz2). Again in groups we did a GAP analysis for the specific items requested and a CKAN expert was available to answer any questions.
As an academic community we found that there were lots of similar challenges which should be easier to address collaboratively.
From the visual arts community perspective although CKAN can’t currently address all the requirements from our user requirements list (PDF) there is scope for further development and this is continuing in the right direction.
With thanks to Emma Hancox, Assistant Archivist, University of the Arts London for this blog post.
From Tuesday 15th to Wednesday 16th January I attended the 8th International Digital Curation Centre Conference in Amsterdam entitled ‘Infrastructure, Intelligence, Innovation: driving the Data Science agenda.’ The conference was an invaluable opportunity to learn from the research data management experience of professionals from a range of different countries and backgrounds. Here I will draw on highlights of most relevance to the KAPTUR project, however an overview of the full conference including presentation slides is available on the Digital Curation Centre website as are videos of some of the talks.
Day One: Tuesday 15th January
‘Growing an Institution’s Research Data Management Capability through Strategic Investments in Infrastructure’, Anthony Beitz, Monash eResearch Centre.
The key message I took from this talk was Antony’s call to ‘adopt, adapt and develop’, in essence look at solutions that already exist and develop them. Anthony advocated going out into the research community to see what solutions researchers already use within their communities as they tend to be more loyal to their research community than their institution. He also emphasised that a lot of the work has already been done for us; we can use Facebook for marketing, Twitter for customer service and we can adapt a range of open source software to meet our needs.
‘Building Services, Building Communities, Supporting Data Intensive Research’ Patricia Cruse, Director, University of California Curation Centre.
Patricia Cruse emphasised the importance of researcher engagement as early as possible in the digital curation lifecycle. She gave two very useful pieces of advice; ‘start small’ with a simple solution that can be built upwards when more complex problems are met and employ flexible solutions that can be adapted to diverse situations. UCC has a number of tools to assist researchers such as UC3Merritt (for the management, archiving and sharing of digital content) and the Web Archiving Service which allows researchers to capture, analyse and archive websites used in the course of their research. More information is available on the UCC website.
The minute madness session gave poster demonstrators one minute to encourage delegates to view them and vote for them! Many posters represented projects of interest to KAPTUR and I enjoyed wandering around and exploring the display later in the afternoon. Posters of interest included ‘Creating an Online Training Module on Research Data Management for the University of Bath’ (training in research data management is something that KAPTUR project partners will certainly need to consider in the future) and the poster for IMEJI an open source software tool from Germany providing free storage, sharing and metadata creation for audiovisual content which I can see being of use in a visual arts research data context.
Day Two: Wednesday 16th January
‘Institutional Research Data Management’
On the second day I chose from a programme of parallel sessions. In the morning I learnt about the journeys professionals from the Universities of Bath, Edinburgh, Nottingham and Oxford had been on to create, implement and improve research data management capabilities in their institutions. Amongst much useful information I learnt that The University of Edinburgh has created MANTRA, an online learning module available under an open license so it can be rebranded and used by others. Thomas Parsons from the University of Nottingham commented that researchers typically store their data in five places. This emphasised to me the need for research data management training and the value of training modules such as MANTRA. From surveying researchers James Wilson from the University of Oxford found that types of data he had expected to be in a minority, were actually used more frequently than expected. I wondered whether we could also expect this with visual arts research data.
‘Arts and Humanities Research Data’
In the afternoon there was a chance to hear about Arts and Humanities Research Data and an overview of KAPTUR was given by Carlos Silva from the University of the Creative Arts. Following this Marieke Guy gave a presentation entitled ‘Pinning it Down: towards a practical definition of ‘Research Data’ for Creative Arts Institutions.’ This talk discussed work done by the DCC in collaboration with UAL to explore the nature of visual arts research data. Marieke reflected on the fact that whilst there is much consensus on research data in the sciences, this is lacking in the visual arts. Research has suggested that arts researchers do not tend to find the term ‘research data’ useful and find ideas such as ‘documenting the research process’ more useful. She suggested that a definition would be useful, but adopting a scientific vocabulary for the arts can be problematic.
The talks about Arts and Humanities Research Data were the last I was able to attend before I left the conference and ending on this note proved useful for reflecting on the conference in terms of the KAPTUR project. What I felt I took away from IDCC 2013 was that there is much that can be gained from projects at other universities and also a range of existing tools that can be developed and adapted to make life easier. In the visual arts environment, however, we need to continue to think about how research data can be defined since it doesn’t necessarily fit into the same categories as data at other Universities I heard from at IDCC. We also need to tailor solutions to our own unique context.
With thanks to Carlos Silva, KAPTUR Technical Manager, for the following blog post. The Digital Curation Centre’s (DCC) Research Data Management Forum was held at Madingley Hall, Cambridge from 14th to the 15th November 2012; presentations from the event are available online.
“Technology aspirations for research data management”
The take-home message for the day was that IT will need to be more involved with research and their collaboration will have an impact for future grants, projects and sustainability.
Jonathan Tedds presented lessons learned from University of Leicester via projects such as the UK Research Data Service (UKRDS) pathfinder study and Halogen as well as from other projects such as Orbital. Jonathan covered ‘top-tips’ to get researchers’ attention and how to develop software as a service through the BRISSkit project (Biomedical Research Infrastructure Software Service kit).
Steve Hitchcock covered lessons learned from DataPool on building RDM repositories. The project was specifically to do with SharePoint and EPrints however KAPTUR did get a mention as an example of other projects using EPrints and not re-inventing the wheel. Published in July 2012, an application in the EPrints Bazaar called Data Core:
“Changes the core metadata and workflow of EPrints to make it more focused for as a dataset repository. The workflow is trimmed for simplicity. The review buffer is removed to give users better control of their data.”
Paul O’Shaughnessy from Queen Marys, University of London, spoke about how their IT services are changing and how different parts of the university needed to be involved in making this happen. The University currently has around 16,000 students; they started an IT transformation programme, because their original set-up was not fit-for-purpose, for example there were 7 different email systems. After creating a strategic plan for the next 5 years they realised that a third of their funding income comes from research grants so investing in IT infrastructure to support this was crucial. They were investing from 3 – 4% whereas other Russell Group Universities tend to invest from 5- 10%. They followed a greenfield approach and mentioned the importance of letting the staff know that it was not just IT who will need to be involved and not just another project. An interesting number was that 25% of HSS grant applications were lost because of poor IT sections.
The aim of the Janet brokerage services is to become a community cloud of available resources, by:
- developing frameworks and procurement structures such as DPS to facilitate access to services
- working with DCC and JISC to ensure sensible requirements and priorities
- hoping to get to a conclusion early next year about these services (Janet is currently in talks with Google AWS, Dropbox and Microsoft Azure will probably follow)
There was a comment about limitations with Dropbox but also possibilities that universities may be able to use it in the future and overcoming the current issues of storing research data outside the EU.
Other topics and interesting points from the discussion:
- Suggestion that just as there are Faculty Librarians, we should have Faculty IT people.
- Recommendation to negotiate resources with IT, for example if there is someone with the skills try not to use that person to fix printers but for something more productive.
- A Russell Group University mentioned that 1TB of data stored over 30 years will cost close to £25,000.
Break-out session on the Engineering and Physcial Sciences Research Council (EPSRC)
There was discussion about the research data that they expect projects to make available. They mentioned the importance of joining and gathering together all metadata; and of bringing IT together; a drip feeding of information (for example through OAI, SWORD, other protocols to transfer information and allow metadata to be harvested).
Overall it was a good workshop which provided different points of view but at the same time made me realise that all the institutions are facing similar issues. IT departments will need to work more closely with other departments, and in particular the Library and Research Office in order to secure funding and make sustainable decisions about software.
Finally a ‘flexible’ yet, intelligent approach should be taken from IT for example the use of PRINCE2 methods do not fit research projects as they all change during the duration of the project. The Agile methodology should be used; involvement and knowledge about this from IT should be expected.
With thanks to Jacqueline Cooke, Librarian (Acting), Goldsmiths, University of London, for the following blog post. This workshop was held primarily for the new JISCMRD Research Data Management training projects (2012-13), however other JISCMRD projects were invited to attend, and Jacqueline Cooke kindly represented KAPTUR.
The themes of the day were:
- Librarians’ role in RDM training
- design of training sessions
- components of good research data management
- options for publishing data
The first presentation was from the Digital Curation Centre’s (DCC) Research Data Management Skills Support Initiative – Assessment, Benchmarking and Classification (DaMSSI-ABC) project (2012-13). This project has an overarching brief to support and improve coherence in the development, dissemination and reuse of research data management training materials developed by the JISC RDMTrain projects (2010-11). They will also make links with existing initiatives that promote information literacy for researchers, such as the Research Information and Digital Literacies Coalition (RIDLs) and Vitae, referring to the Vitae Information literacy lens (PDF) on the Vitae Researcher Development Framework.
On a practical level they will support classification and deposit of projects’ training materials into JORUM so they are more easily discoverable for reuse through a JORUM ‘lens’. They will also work strategically to:
- make links with relevant professional bodies
- develop criteria for ‘peer review’ of training courses
- add RDM training to the career profile of librarians
Librarians’ role in RDM training
The strand supports the role of librarians in RDM training, as an extension of their information literacy portfolio and building on their professional ability to act as signposters. See also the Reskilling for Research (PDF) report by Mary Auckland for Research Libraries UK.
RDMRose is looking at taught and CPD learning for information professions. Initially they suggest that librarians have the potential to carry out RDM training but will need to extend their professional identity and build on their existing roles and skills. Many lack knowledge of research culture and need to understand this in order to be trusted.
At the University of East London (UEL) the RDM project builds on the Library’s established lead in RDM. They point out that the Library has a reputation for collaborative projects, they’re credible, they have proven expertise in collecting and cataloguing, compliance (copyright, managing the CLA licence), they value sharing, care about impact through citation and run the repository.
Design of training sessions
The SoDaMaT (Sound Data Management Training for electronic music) project, Queen Mary, University of London and UEL talked about the design of training sessions. There was general discussion as many attendees had previous experience. All were cautious about generic workshops, as researchers in different disciplines or departments will work in different contexts and environments and so advised considering who the training is aimed at.
Tips from the experienced ones:
- keep it short (1 hour optimum)
- include technical basics, formats, storage, use of folders as well as theory
- attach it to other training that is seen as essential or valuable
- don’t call it ‘digital preservation’
- fit it into existing research skills programmes especially for post-graduate researchers
- check consistency of advice with other training on RDM e.g. qualitative data training
- provide online as well as face-to-face sessions and integrate them
Buy in from your institution’s senior management team is essential; they are now more likely to be receptive due to the current high profile of the Finch report, Open Access agendas and the impact on research funding. SoDaMaT suggest that researchers are engaged by evidence. They use dramatic stories of data loss, and point out the IPR consequences of ‘curation in the cloud’. The University of Leicester’s RDM web page is presented as a scenario:
“What would you do if you lost your research data tomorrow? RDM isn’t principally about complying with policy. It means helping you to complete your research, share the research and get credit for what you have done.”
Effective advocacy emphasises the value of RDM to researchers to make the business case for introducing training:
- saves researchers’ time looking up previous work
- helps you get funding
- it is like ethics, doing it well will enable you to do your research better (UEL)
- sends your research into the future, enables citation of data along with articles
Components of good RDM
Good practice in RDM has usually been boiled down to four steps, variously
|University of Leicester||Create||Organise||Access||Look after|
|IHR/JISC||Start early||Explain it||Store it safely||Share it|
Further details from: SoDaMaT’s wiki ‘Online training materials’, University of Leicester’s RDM page, the JISC ‘Incremental’ project page, LSE/Cambridge/IHR/ULCC’s ‘Sending your research material into the future’ project.
Options for publishing data
The trainer needs to ask questions about the data and about working practices and agree a definition of data, because “researchers have many ways to approach RDM on their own terms” (UEL).
- What data is available? (e.g. in science raw data/usable data/datasets/supporting material/all worked data)
- Who decides what data to save and give access to? (Referee? RCUK? PI?)
- Where can data be published? (national data archives/learned societies website/institutional repositories/journals). Not all of these will be available in all disciplines.
There was a discussion of data publication issues, covering:
- Culture change, how much awareness is there of the issues of RDM?
- Citation of data supporting published articles works well if publishers hold it, then the data package gets a doi (see Dryad project)
- Publishers/learned societies say they will do what communities want, therefore there is an opportunity to influence development of other players
- Publishers should not take IPR of data, advise use of a CC-BY licence if possible
- EPSRC institutional ‘Roadmap to research data management’ includes training
With thanks to Anne Spalding, Kaptur Project Officer, University for the Creative Arts, for the following account of DataCite’s Managing sensitive data workshop, The British Library, London, 29th October 2012.
On Monday 29th October I attended my first DataCite workshop; this particular workshop is the third in a series. Slides from this and previous workshops are available via The British Library Datasets web pages.
During the morning session there were four presentations followed after lunch by a workshop where four groups focussed on data management scenarios. Feedback from the workshops and a general discussion rounded off the day.
The first speaker, Veerle Van den Eynden spoke about managing sensitive data from the UK Data Archive‘s experience. She explained in broad terms the legal aspects and also the role that research ethics, data archives and repositories play in the management of research data.
Jonathan Tedds from the BRISSkit project spoke of managing medical and personal data. As part of the project a survey of 3000 staff was conducted in 2010 regarding their own use and re-use of research data. In due course a summary of their findings will be available as part of the project outcomes. Jonathan emphasised the need to make the process of depositing data more engaging for researchers. Jonathan mentioned work in managing research data undertaken by the University of Virginia Library.
From UKOLN, Cathy Pink gave a very interesting presentation on working with commercial partners as part of the Research 360 project. One focus of the project is on the issues and challenges that arise from private sector partnerships and research collaborations. Cathy illustrated the different collaboration agreements that are in place at Bath University. Another important aspect of citing and discovering research data is the use of metadata and Cathy cited the work of Sally Rumsey ‘Just Enough Metadata’.
The final presentation was given by Brian Mathews of the Science and Technology Facilities Council (STFC). Brian’s talk focussed on some issues in research ethics arising from data sharing and also that we are working in a political environment. He referred to the Opportunities for Data Exchange (ODE) and a paper entitled ‘Ten Tales of Drivers and Barriers in Data Sharing’.
One of the main discussion points emerging from the workshops and feedback was the use of Digital Object Identifiers (DOIs). A particular issue was with assigning a DOI to a single object which could change over time and how to note this, is another DOI required? Could an umbrella DOI be assigned for the whole object but somehow allow for changes? Solutions for handling this might depend on work practices within institutions.
This event provided me with a further insight into the complexities of managing research data. The variety of perspectives also demonstrated that we are all grappling with the same issues but might well take different solutions dependant on the institutional environment.