What is a workflow?

The Free Online Dictionary of Computing (FOLDOC) defines ‘workflow’ as:

2. <job> The set of relationships between all the activities in
a project, from start to finish. Activities are related by different types
of trigger relation. Activities may be triggered by external events or by
other activities.

What this translates to for us, i.e. the definition we adhere to is: the co-ordination of one or more services into a data analysis pipeline. This is treated as one entity, or a workflow.

Workflows in science

What does this all mean in a scientific context? The overall project referred to is your analysis or experiment. The services are simple operations within your analysis (experiment). All these operations have a certain number of inputs and outputs.  In the case of fetching a DNA sequence, an input may be an identifier of the sequence, whilst the output is a string representing the nucleotide sequence represented by this identifier.

The triggering of activities by other activities are where an operation feeds data into a subsequent operation. For example, the ‘fetch sequence’ operation may feed its output (the string containing sequence ‘ACTG’) into a ‘transcribe’ operation. This would subsequently change the DNA sequence into an RNA sequence. We would then have a simple workflow with one operation, and a link, which looks something like the following:

The diagrammatic representation of this workflow is hopefully comprehensible. Information flows from the top to the bottom. Entering at the top, a workflow input (in this case called ’DNA_sequence’) passes through one operation, with the result exiting at the bottom into the output ‘RNA_sequence’. The named inputs and outputs correspond directly in this case to named parameters you would otherwise have to enter into a Web form or type on the command line.

Where do these operations live?

When you run a tool from a command line on your workstation or visit a Web page with some interactive function such as those provided by the EBI and NCBI, you are reasonably sure where the functionality is located. In the first case you are using an application on your own machine, it runs like any other program you may have installed and you (or your system administrator) must install and keep any ancillary data up to date (e.g. downloading the latest database release). In the second case you are assuming that the service provider – the institute or organization providing the Web page – will do this for you.  With this, you lose the overhead associated with maintaining the service at the cost of having to access it through a web browser interface (fine for one or two queries but not so good for thousands).

In the case of operations within a service oriented workflow system, such as Taverna, the operations you are accessing are primarily located on other machines, and may even be in different countries. The role of Taverna is to remove the tedious parts generally associated with general data analyses, e.g. removing the need to cut and paste data, press buttons on forms etc. Although Taverna does not access Web pages itself, you can behave as if it is using this approach. Each service (as a replacement for the Web page) is provided by an institute that has some code you can run, without actually having to own or possess the code. All you have to do is point Taverna to the location of the interface to the code. This is typically a URL that contains a WSDL (Web Service Description Language) file. A WSDL file describes how a client program (Taverna) can access a piece of code (located on remote machine). These files are not designed to be human readable, but can be interpreted by Taverna to allow you to obtain access to them. An example file can be found at: http://xml.nig.ac.jp/wsdl/Blast.wsdl

Massive power, minimal complexity

Because the operations within a workflow do not reside normally on the machine you use to create and run the workflow, your local machine does not have to be a supercomputer! By installing and using the Taverna Workbench application you can tap into the resources of tens of institutes, hundreds of analysis applications and literally thousands of processors worth of computational power entirely for free, with no installation or support hassle for you.

Of course, if you already have significant resource in house it is a relatively simple matter to integrate these resources with those available from other sites.

If all this sounds too good to be true, well, all we can ask is that you try it – download the Taverna Workbench and have a play, we think you might be surprised.