Working in Stages with DataStage and Figshare

With thanks to Carlos Silva, Technical Manager, for the following blog post:

The KAPTUR Technical Analysis report (PDF) recommended the piloting and further investigation of two different systems: DataStage to EPrints; and Figshare to EPrints.

Figshare to Eprints

Some of the advantages of integrating Figshare with EPrints are:

  • The upload tool to Figshare allows multiple uploads using WebDAV and javascript.
  • The Figshare team is currently working on a desktop uploading tool to allow users a streamlined process of submission.
  • Feedback from the Steering Group was that the user interface of Figshare was attractive and clear; it is already being used by researchers to store and manage research data and therefore the integration with EPrints would enable many institutions (as EPrints is the major repository platform in the UK) to encourage researchers to better manage their research data and then upload selectively to an institutional repository for publication.

Following telephone and Skype chats with the Figshare team a requirements document was created and shared with project partners and Simon Hodson. The idea was to create an API which would be free for use by any institution who wanted to link Figshare with an EPrints repository using the SWORD 2 protocol.  Additional features included the development of the desktop uploader; a custom user interface design; back-end application development; and custom user accounts for the KAPTUR project partners to test the system.

Currently, negotiations are still in progress and further thought has been given to the infrastructure and pricing models that will eventually have an impact when adopting a commercial approach with technologies such as Figshare and that if not considered could lead to an unsustainable solution for the sector.

DataStage to EPrints

The second pilot recommended by the report was to link DataStage (from the JISC funded DataFlow project* with EPrints.  The technical implementation of this pilot started in June 2012 when the Technical Manager set-up DataStage and DataBank on a local machine; demonstrated this to the Project Officers (in June) and the Steering Group (in July) and started collecting feedback on this.  After testing the DataFlow software internally, the team started to explore the best way of linking DataStage with EPrints directly.

The advantages of integrating DataStage with EPrints are:

  • DataStage offers the potential of being institutionally based, and therefore tighter control.
  • It provides a structured metadata collection interface.
  • It also provides flexibility when uploading, for example with the integration of a shared drive which uses a popular storage approach similar to Dropbox but with the advantages that the data is held on the institution’s servers.

Proposed integration of DataStage with EPrints, July 2012 (SlideShare)

The Technical Manager through VADS’  host institution – the University for the Creative Arts – set-up a test environment for the KAPTUR project (http://kaptur.ucreative.ac.uk).  Test accounts have been given to project partners and an online feedback form set-up to capture this information.

To test the DataStage connection with EPrints, a test repository with the latest EPrints version (3.3.10) was needed in order to use the SWORD 2 protocol; this was created (http://kaptur_repo.ucreative.ac.uk).

Both systems have been tested separately, and both systems have performed well.

The DataStage software should allow users to submit entire folders as ‘packages’ to a repository using the SWORD2 protocol, however currently there is an issue** with the default version of DataStage and no transfers can be done on any other repository other than into Databank (the DataFlow project’s repository).

As well as contacting DataFlow and EPrints, the Technical Manager has been in contact with various colleagues across the sector, from the Centre for Digital Music at Queen Mary, University of London (see blog post about connecting DataStage with DSpace) to other colleagues who have also looked into connecting DataStage with EPrints such as the UK Data Archive, University of Essex and the RoaDMaP project, University of Leeds.

At this point there are the following conclusions:

  1. EPrints 3.3 is required in order to have SWORD 2 fully enabled [completed].
  2. EPrints have tested the SWORD 2 protocol successfully with other EPrints repositories, however connectivity with other types of repositories hasn’t been tested by EPrints yet.
  3. The DataFlow project manager replied saying that there were issues with the SWORD submission on the DataStage side, however they were expecting to come up with a workaround for their V 1.0 release [It is noted that Richard Jones will be presenting about DataFlow at the JISCMRD Nottingham programme event so this is hopeful!!].
  4. The lead DataStage developer mentioned that SWORD2 was envisioned to fully work with DataStage and EPrints when it becomes available and that previous versions of DataStage managed to work okay with EPrints, however due to new developments and enhancements at either end some changes in the DataStage side need to happen before it fully complies and can connect with EPrints.

*DataFlow was funded by JISC, under the University Modernisation Fund, from June 2011 – May 2012 to further develop a prototype out of the JISC-funded ADMIRAL project (2009-11).

**A blog post at the end of August noted the action “Review Sword access problems, isolate and fix (getting external help if needed).”


KAPTUR Steering Group meeting, HEFCE, 18th July

View from HEFCE, 12th Floor,
Centre Point, London. Photo: MTG

Key points from the meeting:

  • It was noted that there was diversity among the four institutions in terms of drafting the RDM policies – we can still collaborate and learn from each other – but the approach is necessarily different at each institution.
  • University of the Arts London are really benefiting from their participation in the DCC University Engagement programme; the UAL Project Officer is working an extra day per week on this and as a result has been able to revisit and extend the KAPTUR Environmental Assessment through 20 x 5 minute telephone calls which will be followed up with 1 hour in-depth interviews with visual arts researchers.
  • There was discussion about a definition for visual arts research data and how this might be constraining, but was needed at the same time in order to be able to move forward with the RDM policies. A working definition was presented to the KAPTUR Steering Group 3 months ago in response to questions raised by the UAL working group: http://www.slideshare.net/kaptur_mrd/kaptur-news06
  • Feedback on training/support and the KAPTUR toolkits: recommendation to create KAPTUR videos about visual arts research data instead of hosting workshops at each institution (we already had plans to re-use content from the previous JISCMRD programme e.g. http://www.youtube.com/user/GUdatamanagement). I still think the face-to-face aspect of the workshops would be useful, but maybe there is a way to incorporate shorter sessions and use the videos as part of these? We will discuss at our next project team meeting in September.
  • The Steering Group liked the Figshare interface and thought it would be appealing to visual arts researchers as well as easy to use; there were lots of questions about both DataStage and Figshare.
  • Feedback on Sustainability: recommendation to get an idea of costs of the proposed technical infrastructure to include estimates of staff time required for ongoing support of the systems.

The presentations are available from SlideShare.

The Triumphal Quadriga or Horses of St Mark, facade of St Mark’s Basilica in Venice.
Creative Commons Attribution 3.0 Unported license

It was great to welcome Laura Molloy, Researcher at the Humanities Advanced Technology and Information Institute (HATII), to the Steering Group meeting. After the meeting Leigh, Laura and I met to discuss the project from the perspective of her role as JISCMRD Evidence Gatherer. As well as discussing impact and gathering evidence about benefits, Laura also came up with the concept of the chariot (KAPTUR project) being pulled by four horses (our four institutions). I really liked this idea of the race and also the need for collaboration to be well-matched in order to make the project successful.


Meeting at Central Saint Martins, University of the Arts London, 2nd July

This, our 9th project team meeting, was a bit of an adventure from the start, as the building is so new its postcode has not been picked up by Google Maps yet! However the venue is easy to find with clear markings from King’s Cross to the King’s Boulevard. Once on the 5th floor, there were spectacular views both inside and outside of the building:

CSM Granary Building, London. Photo: MTG

Key points from the meeting:

  • The Technical Manager provided a demonstration of DataFlow’s DataStage, which has now been installed on a local machine for testing purposes. There was also discussion about Figshare.
  • The UAL Project Officer spoke about DCC’s Institutional Engagement work with UAL.
  • Each Project Officer presented about two externally funded visual arts research projects (forthcoming blog post).
  • Everything appears to be on course for draft RDM policies to be approved at the Autumn Research Committee meetings; Project Officers will give short presentations about their RDM policy work at the Steering Group meeting on 18th July.
  • The Project Manager and Project Officers have collaborated on an A-Z of visual arts research data based on quotes from researchers in the KAPTUR Environmental Assessment report (forthcoming publicity and/or blog post).
  • The Project Manager is working with Angus Whyte from DCC to put together a programme for an event on ‘Selecting and Appraising Research Data’, to take place in September.
  • We discussed the timescales for producing the toolkits and institutional workshops; November was scheduled for the workshops. The Project Manager has been in conversation with Joy Davidson of DCC to find out more about the DCC training materials.

Building a pilot demonstrator service for the visual arts

The following blog post is adapted from the Conclusion and Recommendations section of the Technical Analysis report (PDF):

The KAPTUR Technical Manager investigated 17 different types of software which were compared to the requirements of the four partner institutions (details and appendices in the report). The next stage of the research reduced the choice of software to five options: DataFlow, DSpace, EPrints, Fedora, Figshare. These were all found to be suitable for managing research data in the visual arts; through a further selection process EPrints, Figshare, and DataFlow were identified as the strongest contenders.

[…] it is recommended that two pilots occur side by side: an integration of EPrints with Figshare and a separate piece of work linking DataFlow’s DataStage with EPrints. By integrating EPrints with Figshare, the project can take advantage of a system which has been built with, and for, researchers to handle research data specifically, and has a user-friendly visual interface (which is constantly evolving and enhanced by Figshare directly). […]By integrating DataStage with EPrints the research data storage and software will be hosted within each institution, providing them with better control over the type of data that can be stored, published and managed. The integration will also enable content uploaded in DataStage to be securely backed up by the institution and accessible from anywhere in the world. A ‘Dropbox’-like tool is featured in the latest beta version, providing a user-friendly interface which will benefit visual arts researchers. EPrints will effectively provide the role of DataFlow’s DataBank.