EPrints support for Cerif in Action

Cerif in Action (CIA) requires support for translating the native system data into and from CERIF 1.4. EPrints has long had excellent support for importing and exporting data based on a system of plugins. These include support for XML and JSON representations, Dublin Core, Bibtex etc.

While EPrints is orientated around publications (“eprint” records) CERIF/CIA requires support for other entities. Within EPrints these might be people, projects and organisations (CERIF covers other things e.g. equipment but I’m not aware of anyone using EPrints to record that data).

Representations of people in EPrints may mean user records (accounts that people log in to) or the names attached to publications as authors, editors or contributors. The concept of a user is distinct from an author e.g. the user may be a secretary who deposits on behalf of faculty, so most of the time a “user” is not going to be exported via CERIF. Instead, the names of authors/editors/contributors are exported. If these have an identifier attached to them it is used to uniquely identify the person.

Projects and organisations are reflected in the EPrints projects and funders publication fields. These have existed for some time but are only free-text fields – that’s how we stored that data in the Southampton EPrints. Like authors, these can be exported in CERIF but only as an anonymous record. The experience gained with CIA will allow us to improve support for projects and organisations (in particular grant numbers!).

As the world seems to be moving towards CRIS-systems, so we want to improve the support for capturing non-publications research data in EPrints. While currently authors, projects and funders are properties of publications, these would instead be represented by first-class objects with their own data schemes. The balancing act to find is between flexible, fuzzy data and strict but more complex data. Schemes like CERIF really want strict data, but conversely we don’t want to confront users with time-consuming submission forms and workflows.

The second, more tricky part of CIA is support within the user interface tools to allow the user or administrator to import and export complex sets of records. The user needs to be able to pick and choose, for both import and export, the records they want. When importing records the system needs to detect existing duplicate records and provide a means for the user to merge/resolve those duplicates.

Because the CERIF export is just another export plugin, users and administrators can use any existing search or view to construct a set of records to export. CIA calls for the ability to pick individual records, which can be achieved by installing the Shelves extension (available on the EPrints Bazaar).

Import is always more complex than export. As part of the CIA activity the EPrints import interface has undergone some re-working. Because CERIF can represent multiples types of object the import has had to be re-worked to support importing multiple object types at the same time. Instead of all-or-nothing the interface now provides a list of objects found in the import, which the user can then pick from. Work is currently ongoing to detect and support merging duplicate objects – not a trivial task.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s