Taverna has now moved to the Apache Software Foundation. For updated information, see Apache Taverna (incubating).

SCAPE

SCAPE (SCAlable Preservation Environments) is an EU-funded FP7 project aimed at building a scalable platform for planning and execution of computation intensive processes for ingestion or migration of large data sets in order to help automate digital preservation.

The data sets involved are either heterogeneous data collections (collections of objects of different type), contain data objects that are themselves large or complex in structure or contain a huge amount of digital objects. The sheer volume of data archives makes it impossible to use the current service-oriented architectures to ensure access to digital information over time.

In this project, preservation processes will be realised as data pipelines and described formally as automated, quality-assured preservation Taverna workflows. The workflows will invoke various services for planning and execution of institutional preservation and quality assurance strategies. The workflows will be deployed on a large scale (using Hadoop MapReduce clouds) and executed over large, distributed and heterogeneous collections of complex digital objects.

The workflows will enable reproducibility of preservation processes and collect provenance data over the entire digital object’s lifecycle.

The execution of workflows will be controlled by a policy-based “planning and watch” system, which will ensure the workflows are in line with state-of-the art in digital object representation, file formats, rendering tools, etc. and detect and report any errors in a preservation process.

SCAPE and Taverna in use

by Schlarb Sven, sven.schlarb@onb.ac.at, Austrian National Library

SCAPE consists of testbeds that have three application areas: Digital Repositories, from the library community, Web Content, from the web archiving community, and Research Data Sets, from the scientific community. The testbeds are used to evaluate solutions developed by the SCAPE project against defined institutional data sets in order to validate their applicability to real life application scenarios, such as large scale data repository ingest-workflows or data archive maintenance.

Taverna is being used to develop concept workflows locally, to study the feasibility of differing approaches. Taverna server is used for remote execution of workflows, with the Tool service in use as the main integration pattern.

In the context of how Taverna technology is currently used from a research perspective it is being used as a tool for ETL (Extraction/Transform/Load) and creating query-able data tables from large digital object collections. Content analysis and quality assurance are also performed using Taverna.

Austrian National Library

Max Kaiser, Head of Department of Research and Development
Bettina Kann, Head of Digital Library