SCAPE

SCAPE (SCAlable Preservation Environments) is an EU-funded FP7 project aimed at building a scalable platform for planning and execution of computation intensive processes for ingestion or migration of large data sets in order to help automate digital preservation.

The data sets involved are either heterogeneous data collections (collections of objects of different type), contain data objects that are themselves large or complex in structure or contain a huge amount of digital objects. The sheer volume of data archives makes it impossible to use the current service-oriented architectures to ensure access to digital information over time.

In this project, preservation processes will be realised as data pipelines and described formally as automated, quality-assured preservation Taverna workflows. The workflows will invoke various services for planning and execution of institutional preservation and quality assurance strategies. The workflows will be deployed on a large scale (using Hadoop MapReduce clouds) and executed over large, distributed and heterogeneous collections of complex digital objects.

The workflows will enable reproducibility of preservation processes and collect provenance data over the entire digital object’s lifecycle.

The execution of workflows will be controlled by a policy-based “planning and watch” system, which will ensure the workflows are in line with state-of-the art in digital object representation, file formats, rendering tools, etc. and detect and report any errors in a preservation process.