Taverna uses its own data document format for various reasons. Corresponding to this is an object model which is used to convey data between components in the core and extensions of the enactor. In order to make use of Taverna’s functionality from your code you need to understand the design and API for this data model, and optionally the XML serialization of it (although this can be handled directly from the API). This document applies to versions of Taverna 1.x from beta6 onwards.
Typing
The data within the workflow carries three distinct and orthogonal classes of metadata. The first is a very loose typing of the data defined by a set of collection style type constructors and lists of MIME types. The second is a free text description and the third structured metadata in the form of RDF statements. The last two of these have no direct impact upon the API or document formats and so will not be covered in any great depth in this document. The first, however, is clearly critical.
We decided that the problem of data typing in life sciences is simply too hard for us to attack. The domain is an open one, in that we do not control all the resources that we need to be able to describe. In addition to this, it is not desirable to have the workflow engine and supporting software mandate any particular data typing on the services it uses. Our use for a type system is therefore fairly restricted; we need to have information about the overall data format in order to be able to render results, and we also require the workflow framework to comprehend the cardinality of the data and its gross structure in order to be able to perform translations such as single item to single item list or to automatically detect and reconcile list -> single item type relations (activating the implicit iteration support to repeatedly call a simple service on the contents of a collection). Aside from these two uses, we actually regard typing information as overly restrictive.
Our type system for Taverna therefore looks something like an optional arbitrary nested collection constructor wrapping a list of one or more MIME types. In the bioinformatics context, the most common of these by far is the trivial TEXT/PLAIN type. Syntactically Taverna types are something like:
TYPE = PRIMITIVE | COLLECTION
PRIMITIVE = "'"<MIMETYPE>[","<MIMETYPE>]*"’"
COLLECTION = s|l|p|t"("PRIMITIVE|COLLECTION")"
So, for a list of trees of HTML documents the type is expressed as: l(t(’TEXT/HTML’)). The collection constructors are as follows:
1(..) Ordered List p(..) Partial Order t(..) Tree s(..) Set
Obviously, all these cases are also partial orders, so if nothing is known about the ordering and the code doesn’t want to introspect on the internal data structure it is perfectly acceptable to simply state that everything is a partial order. This may not be helpful but is technically correct. In the current implementation we only support set and lists types; thus far this is all we’ve needed although the type system has been designed to support more complex orderings should that be required.
On the wire encoding and object conversion
The XML representation of the data is not human readable. This is largely due to the use of the Base64 encoding scheme to store the actual data items. For this reason, and obviously for use within Taverna, we have a set of tools that allow the construction of the data document from Java objects and vice versa.
Before being stored in a DataThing object, any data structure undergoes a simple conversion process. This consists of converting any array types into List objects, the exception being byte[] objects which are left. Higher dimension arrays are still converted, so a byte[][] is converted to a List of byte[] objects. The rationale behind this is that a byte[] is normally used to represent a single piece of binary data whereas, for example, a String[] is actually a collection of Strings. Because of this conversion, round-tripping of data from object to DataThing and back will often produce different results; we have so far found no problems with this but in theory we could store additional data to allow the back conversion to occur properly. If you need this then you should tell us, and ideally join the development team!
The relations between data items are encoded explicitly in the XML format, although as stated above there are only two types currently supported, that of lists and sets (total and zero orders respectively).
Points of contact – where to use the Baclava APIs
You will need to make use of these APIs if you are doing any of the following. Firstly, should you wish to enact a workflow you will use these APIs to construct any input document required and to parse the output from the enactor into objects. Secondly, should you be extending the Taverna tasks, i.e. creating a new processor type, you will need to be familiar with the process of constructing new DataThing objects, manipulating their metadata etc.
Creating DataThing objects
The DataThingFactory class provides a single static method bake(Object ..) that builds a new DataThing object from the supplied Java object. This applies the conversion process described above and creates the appropriate metadata dictionaries within the DataThing object. This is the only way you should create a new DataThing.
// Create a DataThing with a single string value, type is ‘text/plain’ DataThing theThing = DataThingFactory.bake("A single string"); // Create a DataThing with an array of strings, type is l(’text/plain’) DataThing theArrayThing = DataThingFactory.bake({"Foo","Bar"});
Providing inputs to and parsing outputs from the enactor
The enactor both consumes and returns a document containing a collection of DataThing representations in XML form. Fortunately this is not hard to work with provided you use the Baclava API. We strongly advise against trying to handle the XML format yourself; we make no guarantees that it will remain stable whereas the API should be reasonably solid. In general these documents are comprised of maps where the keys correspond to named workflow inputs or outputs (depending on the context) and the values are DataThing objects. In order to generate the document representation from a Map or vice versa you should use the methods in the DataThingXMLFactory class.
For example, say the workflow requires two inputs, a string list called ‘inputList’ and a string called ‘inputString’, the following code will create the Document required (this is for cases where the enactor is a remote service, when invoked locally the conversion to XML is not required as we can just pass the map straight to it…):
// Create a new Map and put the DataThing objects defined above into it Map inputMap = new HashMap(); inputMap.put("inputList",theArrayThing); inputMap.put("inputString",theThing); // Create the input document Document inputDocument = DataThingXMLFactory.getDataDocument(inputMap);
From the JDOM document object you can obviously create the string of XML to pass to the enactor Web service along with the workflow spec, user definition etc. Assuming all is well you will receive the workflow results in an identical format and presumably want to read them out, the following code fragment just prints the name and type of the DataThing along with the Java classname of the underlying data object.
// Get the Document from the enactor somehow (this method is not real!) Document outputDocument = MyBogusEnactor.getResults(); // Get the Map Map outputMap = DataThingXMLFactory.parseDataDocument(outputDocument); for (Iterator i = outputMap.keySet().iterator(); i.hasNext();) { String outputName = (String)i.next(); DataThing outputValue = (DataThing)outputMap.get(outputName); // Query the DataThing to get its syntactic type, i.e. l(’text/plain’) etc. String outputType = outputValue.getSyntacticType(); // Get the data object that this DataThing is a wrapper for. Object outputContents = outputValue.getDataObject(); // Show some token information about the DataThing. System.out.println("Found a data item in the output document with name ‘" + outputName + "’ and type ‘" + outputType + "’); System.out.println("Class of data inside DataThing is " + outputContents.getClass().getName()); }
Visualising DataThing objects within a GUI
The ResultItemPanel class in the scuflui package provides a Swing component to render and allow exploration of the structure of a DataThing object. A subclass of JPanel it is constructed with a single DataThing object, and provides a split pane display – the left pane shows the collection structure of the DataThing, selecting items within this pane show the value in the right hand pane. The data is rendered according to the MIME types contained within the DataThing object, currently understood values are:
|
image/*
|
assumes that the item is a byte[] and attempts to load it into a JLabel as an image
|
|
text/plain
|
the default, just displays text in a monospaced font
|
|
text/html
|
uses the JEditorPane class to render as HTML
|
|
text/rtf
|
as for text/html only renders as Rich Text Format (RTF)
|
|
text/x-taverna-web-url
|
treats the contained string as a web URL and loads the linked page into the pane
|
|
text/x-graphviz
|
treats the contained string as a dot file from the graphviz package and renders using a local installation of dot, falls back to text if this fails.
|
The Taverna Workbench 1.7.x uses this class in conjunction with a tabbed pane display to provide a results panel with one tab per output from the enactor, this seems to be a relatively usable result browser. In addition to the basic visualization functionality, context menu clicks on the data items allow the user to save the contents to local disc.







