Working in Stages with DataStage and FigsharePosted: October 20, 2012
With thanks to Carlos Silva, Technical Manager, for the following blog post:
The KAPTUR Technical Analysis report (PDF) recommended the piloting and further investigation of two different systems: DataStage to EPrints; and Figshare to EPrints.
Figshare to Eprints
Some of the advantages of integrating Figshare with EPrints are:
- The Figshare team is currently working on a desktop uploading tool to allow users a streamlined process of submission.
- Feedback from the Steering Group was that the user interface of Figshare was attractive and clear; it is already being used by researchers to store and manage research data and therefore the integration with EPrints would enable many institutions (as EPrints is the major repository platform in the UK) to encourage researchers to better manage their research data and then upload selectively to an institutional repository for publication.
Following telephone and Skype chats with the Figshare team a requirements document was created and shared with project partners and Simon Hodson. The idea was to create an API which would be free for use by any institution who wanted to link Figshare with an EPrints repository using the SWORD 2 protocol. Additional features included the development of the desktop uploader; a custom user interface design; back-end application development; and custom user accounts for the KAPTUR project partners to test the system.
Currently, negotiations are still in progress and further thought has been given to the infrastructure and pricing models that will eventually have an impact when adopting a commercial approach with technologies such as Figshare and that if not considered could lead to an unsustainable solution for the sector.
DataStage to EPrints
The second pilot recommended by the report was to link DataStage (from the JISC funded DataFlow project* with EPrints. The technical implementation of this pilot started in June 2012 when the Technical Manager set-up DataStage and DataBank on a local machine; demonstrated this to the Project Officers (in June) and the Steering Group (in July) and started collecting feedback on this. After testing the DataFlow software internally, the team started to explore the best way of linking DataStage with EPrints directly.
The advantages of integrating DataStage with EPrints are:
- DataStage offers the potential of being institutionally based, and therefore tighter control.
- It provides a structured metadata collection interface.
- It also provides flexibility when uploading, for example with the integration of a shared drive which uses a popular storage approach similar to Dropbox but with the advantages that the data is held on the institution’s servers.
The Technical Manager through VADS’ host institution – the University for the Creative Arts – set-up a test environment for the KAPTUR project (http://kaptur.ucreative.ac.uk). Test accounts have been given to project partners and an online feedback form set-up to capture this information.
To test the DataStage connection with EPrints, a test repository with the latest EPrints version (3.3.10) was needed in order to use the SWORD 2 protocol; this was created (http://kaptur_repo.ucreative.ac.uk).
Both systems have been tested separately, and both systems have performed well.
The DataStage software should allow users to submit entire folders as ‘packages’ to a repository using the SWORD2 protocol, however currently there is an issue** with the default version of DataStage and no transfers can be done on any other repository other than into Databank (the DataFlow project’s repository).
As well as contacting DataFlow and EPrints, the Technical Manager has been in contact with various colleagues across the sector, from the Centre for Digital Music at Queen Mary, University of London (see blog post about connecting DataStage with DSpace) to other colleagues who have also looked into connecting DataStage with EPrints such as the UK Data Archive, University of Essex and the RoaDMaP project, University of Leeds.
At this point there are the following conclusions:
- EPrints 3.3 is required in order to have SWORD 2 fully enabled [completed].
- EPrints have tested the SWORD 2 protocol successfully with other EPrints repositories, however connectivity with other types of repositories hasn’t been tested by EPrints yet.
- The DataFlow project manager replied saying that there were issues with the SWORD submission on the DataStage side, however they were expecting to come up with a workaround for their V 1.0 release [It is noted that Richard Jones will be presenting about DataFlow at the JISCMRD Nottingham programme event so this is hopeful!!].
- The lead DataStage developer mentioned that SWORD2 was envisioned to fully work with DataStage and EPrints when it becomes available and that previous versions of DataStage managed to work okay with EPrints, however due to new developments and enhancements at either end some changes in the DataStage side need to happen before it fully complies and can connect with EPrints.
*DataFlow was funded by JISC, under the University Modernisation Fund, from June 2011 – May 2012 to further develop a prototype out of the JISC-funded ADMIRAL project (2009-11).