With thanks to Carlos Silva, KAPTUR Technical Manager, for the following blog post.
On 18th February I attended a workshop led by the JISC funded Orbital project, to gather information about the open source software CKAN and how it could be used to support research data management in the academic sector.
The workshop started with a presentation from Mark Wainwright (community co-ordinator for the Open Knowledge Foundation) on the latest release of CKAN, its origins and potential in the academic community.
One of the big advantages with using CKAN is that the ‘core’ system is surrounded by APIs allowing it to be flexible enough to accommodate different user and institutional needs. This means that the core software can be updated without affecting the APIs or having to adapt external code to fit with the core software.
Another important feature that looks promising is the ability of CKAN to not only harvest other CKAN databases, but also to search other types of repositories such as EPrints and DSpace. The mechanism developed covers different repository sources not only EPrints and DSpace, but also Geospatial Servers, Web catalogues and other HTML index pages.
In terms of sustainability, CKAN has been developed over the last 6 years, so it is relatively mature now with an extensive and very streamlined workflow process to add features, fix bugs and enhance the core services. The latest version 2.0 (recently released as Beta) promises to be an exciting release with more visually enhanced tools, improved groups feature, customisable metadata and a rich search experience based on their Apache Solr search.
The workshop continued with a presentation from the data.bris project at the University of Bristol. It is amazing to note that each Principle Investigator can apply for up to 5TB of storage for free and backed up securely for 20 years!
Academics receive a mapped network drive which they can access and use to deposit content, however this requires additional features to manage research data. Therefore, the data.bris project was interested in CKAN due to its flexibility, data access (ability to have private datasets), organisation schema, ability to share with external researchers and the CKAN search engine.
In the future, the University of Bristol is considering two instances of CKAN, one for a public read-only catalogue of research data publications and another for controlled access (which would include teaching and other types of data).
The third presentation was from Orbital; Project Manager Joss Winn provided a virtual tour of the latest tools developed by the project. They have connected CKAN between different instances: to their EPrints repository and also to different departmental databases, such as an awards management system.
The Orbital set up allows their researchers to have different types of data located in a central place, this includes the policies, profiles, publications and analytics information from specific outputs, making the most of the CKAN software.
The demonstration included mention of the software created to enable deposit of data from CKAN to their EPrints repository – something which we have been anticipating for the last few months and is an exciting development for the sector. Orbital have released the code through Github which in theory should work with CKAN version 1.7. The functionality enables CKAN to submit the metadata to EPrints using the SWORD2 protocol but not the actual files themselves – instead a link is added to EPrints which links back to the files deposited in CKAN.
The Orbital team are proposing a two year roadmap to their senior management team to take responsibility and carry this project forward and embed it further into the University of Lincoln’s infrastructure.
During the group discussion session, workshop participants suggested a comprehensive list of about 80 tools, features, amendments and requests that we would like to see as part of a new version of CKAN (a Google Docs spreadsheet is available: http://lncn.eu/mxz2). Again in groups we did a GAP analysis for the specific items requested and a CKAN expert was available to answer any questions.
As an academic community we found that there were lots of similar challenges which should be easier to address collaboratively.
From the visual arts community perspective although CKAN can’t currently address all the requirements from our user requirements list (PDF) there is scope for further development and this is continuing in the right direction.
This is our update for the end of the thirteenth month of KAPTUR.
WP1: Project Management
- The whole Project team met on the 13th November at The Glasgow School of Art.
- Over the last month we have been managing the challenge of two of the four Project Officers resigning from the project. John Murtagh was part-time at University of the Arts London (UAL) and has successfully applied for a full-time role at the University of East London working on their RDM training project (starting on 26th November). Tahani Nadim has been awarded her PhD and has accepted a post-doc position at another institution which will begin in the New Year; interviews with internal candidates are scheduled for December.
- On 14th November the Project Manager met with colleagues at the UAL, including John’s replacement, Sarah Mahurter, Manager of the University Archives and Special Collections Centre. Betty Woessner, Research Systems and Data Manager, will work with the DCC on the Institutional Engagement project.
WP3: Technical Infrastructure
- The Technical Manager attended the JISCMRD programme event, 24th-25th October 2012, Nottingham. It was an opportunity to share the technical work that we have been piloting and also to learn from other projects. Following a presentation from Richard Jones, representing the DataFlow project, and a practical hands-on workshop, there was no resolution to the fact that DataStage is unable to connect with EPrints.
- The Technical Manager has created a test instance of CKAN as this appears to be a way forward with a stronger case for long term sustainability as well as building on the work of University of Lincoln’s Orbital project.
- University of the Arts London have reported that their policy does not need to be approved by the Academic Board, so this completes their delivery of WP4: http://www.arts.ac.uk/research/data-management/
- University for the Creative Arts and Goldsmiths, University of London have had their draft policies approved at the same level as UAL, however these now need to go on to their Academic Boards in January for final approval.
- The Glasgow School of Art have revised their timescale for the policy due to the recruitment of two key staff who they want to feed into the policy; this is now expected to be approved at their Research and Knowledge Exchange Committee meeting in February. Academic Board approval is not required.
- The four policies will be made available through DCC in due course (UAL’s policy is already available via the link above).
WP5: Training and Support
- The first KAPTUR training workshop was held at UAL on Monday 19th November, with support from Marieke Guy and Joy Davidson from the DCC (due to the Institutional Engagement work). Further details and a list of attendees is available here: http://ualrdm-eorg.eventbrite.co.uk/ Presentations are available online here: http://slidesha.re/QTrHcs http://slidesha.re/SnzvBL http://slidesha.re/QnwQIq
- The further three KAPTUR training workshops are scheduled as follows: 27th November (Goldsmiths) with follow-up in January; 30th November (GSA) with follow-up in January; 16th January (UCA).
- Feedback is being gathered from participants to each workshop as well as from the Project Officers themselves, this will then lead to refinements of the KAPTUR training plan.
- The materials used as well as the training plan will be reviewed, re-purposed and re-packaged for use in common Virtual Learning Environments and also for deposit to JORUM. This will form the KAPTUR toolkits.
WP6: Evaluation and Sustainability
- Two of the four case studies have been completed to very good draft stage. The UAL and Goldsmiths Project Officers were asked to focus on this aspect of the project ahead of schedule in order to capture their knowledge before they leave. Their successors will make any adjustments required.
- The new UAL Project Officer and the Project Manager are attending the JISCMRD Benefits programme event in Bristol, 29th-30th November.
- Both the IDCC13 paper and poster proposals were successful.
- The Technical Manager presented at the JISCMRD programme event on 24th October, Nottingham (Carlos’ presentation). The Project Manager also presented a poster (available with audio explanation here) and was part of the Selecting and Appraising Research Data session on 25th October (blog post).
- Jacqueline Cooke attended the RDM Training workshop on 26th October (blog post).
- Anne Spalding attended the DataCite workshop ‘Managing Sensitive Data’ on 29th October (blog post).
- Carlos Silva attended the RDM Forum ‘Shaping the infrastructure’, 14th-15th November (blog post).
The following blog post is by Carlos Silva, Technical Manager for Kaptur:
The Hack Day started with quick presentations from attendees to find out about our projects, our interests, pose questions and to start assembling teams who shared similar ideas, ambitions and problems.
By the end of the afternoon we were allocated a team and a task to do and started working on a particular problem.
There were four teams which covered the following topics:
- Stakeholder Driven Metadata
- Dropbox for Institutions
- SWORD 2 protocol and Bit Torrent
- Data collection from research activities
1. Stakeholder Driven Metadata
Using a metadata map we were trying to map different schemas such as Dublin Core with OAI-PMH and the British Library.
Looking at this from a users perspective, the users will need to follow a certain workflow, for example using a DMP and so on (N.B. view prezi about this).
The team also worked on an example to show different types of handling DOIs and metadata between different schemas: http://homes.ukoln.ac.uk/~ab318/datacite/
I mentioned that the Kaptur project involves creating a model of best practice in management of visual arts research data and how using different types of metadata schemas was a problem for some institutions. I also mentioned that researchers in our sector need to handle different types of data and not only large amounts of data but also different metadata schemas and fields that may not be covered by the default Dublin Core or OAI-MPH schemas.
Finally there was an unofficial launch of the Journal of Open Research Software: http://openresearchsoftware.metajnl.com
2. Dropbox for Institutions
Sparkleshare was mentioned during the presentation, but it was noted that it is unstable to use in production environments.
A blogpost is available here with more information: http://blogs.bath.ac.uk/research360/2012/05/mrd-hack-days-file-backup-sync-and-versioning-or-the-academic-dropbox/
3. SWORD 2 protocol and Bit Torrent
SWORD 2 is a protocol for depositing content and its metadata with a repository.
The issue for this group to discuss, was to how to enable any type of file to be deposited.
Big deposits can take a long time to transfer; this isn’t a problem in itself, but there are problems around it. For example you can do partial uploads, however if the transfer is interrupted the repository will not be able to create a record.
Using SWORD and Bit Torrent the team were trying to tackle the problem by splitting the file into chunks, which will allow submitting large files and allow them to upload them into the server despite interruptions.
Advantages could be found immediately: it is secure, you can track it and also limit the number of uploads.
This project won support for further enhancement and will receive two days paid by JISC to further enhance it and develop it.
4. Data collection from research activities
The concept was straightforward: when people start to upload content, information will come not only from the users, but also from the actual file itself.
The team attempted to build an API to do this, however further time was needed to complete this.
Ultimately the project was intended to be a very big feed that will tell what has been done around the whole record such as visits by a researcher, modifications to the file, anything to do with the record so that all that information could be gathered by the System Admin to create reports.