Success stories

Taverna has been used in a large number of different domains from music to meteorology to medicine. The success stories shown here represent a very small sample of the impact that Taverna has had on science and scientists.

In general workflow use

The recovery of data about mutated proteins

Taverna has been used to automate the cleaning up and extraction of data by the EMBRACE project from the tGRAP database of mutated G-protein coupled receptors.

This work is described in the document The reincarnation of the tGRAP database by Vroling.

Hide

The identification of mismatches and possible annotations in workflows

The ISpider project developed three tools that make use of Taverna’s functionality and plugability to support the development of workflows, in particular for proteomics.

Much of the work has been done by Khalid Belhajjame and additional information can be found in his publications.

A tool for the identification and the characterisation of mismatches in scientific workflows

By using the Feta service registry and extending the capabilities of the Taverna Workbench, this tool provides a means for automatically identifying and characterising the mismatches that may arise between the constituent operations of an in silico experiments.

A tool for automatically inferring service annotations

This tool allows the derivation, in an automatic fashion, of information about the annotations of service parameters using existing tested-and-tried workflows. It has been evaluated using the workflow repository provided by myExperiment together with the myGrid Ontology.

A system for querying multiple proteomics data sources

It supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. This system has been used for providing an integrated access to four proteomics data sources, namely, gpmDB, PedroDB, PepSeeker and Pride.

Hide

In knowledge extraction workflows

Taverna is used by researchers at NBIC as an enactment platform for BioAID workflows from the AIDA (Adaptive Information Disclosure Application) toolkit developed by VL-e that offers services for knowledge extraction, text mining and knowledge management.

There is a myExperiment group for Adaptive Information Disclosure workflows.

The presentation My BioAID: personalised text mining with web services from the AIDA toolbox by Marco Roos describes the use of the AIDA toolkit and Taverna.

Hide

In chemistry

The integration of a chemistry-specific toolkit with Taverna

The Chemistry Development Kit (CDK) project have developed a plugin to allow their tools to be used within Taverna workflows.

There is a myExperiment group for workflows using the CDK-Taverna plugin.

This work has been described in the poster Creating chemo- & bioinformatics workflows, further developments within the CDK-Taverna Project by Kuhn et al.

The latest posts from the CDK Taverna Web site are:

Hide

In medicine

The use of imaging algorithms within workflows

The MIASGrid project produced several workflows that demonstrated the applicability of Taverna to the handling of large amounts of medical image data. The workflows made use of MATLAB and also interaction with users.

The workflows covered two domains:

  • The analysis of MRI scans of knees to detect changes in cartilage
  • The description and searching of a database of mammograms

Hide

In social science

The modelling of the open source software community

Taverna has been used to model the research community developing open source software together with the software itself. This work, done by the Free/Libre Open Source Software Research community, is described on their Web site, as well as in publications.

There is a myExperiment group for researchers studying open source software development. The myExperiment group share some of their Taverna workflows.

Hide

In disease-related research

The analysis of the Anthrax bacterium

Taverna has been used by Anil Wipat and others to automate a series of analyses on all the proteins produced by a bacterium to create, by a process of selection and elimination, a list of secreted proteins and their properties.

The secreted proteins explain why anthrax can grow in animal hosts but not in soil.

The paper e-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins by Craddock et al. gives an overview of this work.

Hide

The study of resistance into Trypanosomiasis (sleeping sickness)

One of the major goals of biology, and consequently bioinformatics, is to successfully bridge the gap between genotype and phenotype. Microarray and Quantitative Trait Loci data are increasingly used to aid in the discovery of candidate genes that might be responsible for phenotypic differences. In previous years, studies into genotype-phenotype correlations have been conducted manually. This has led to problems with regards to the identification of functional candidate genes, primarily due to the scale of data being investigated and the reliance on specific expertise, which may bias the investigations outcome.

With the development and connection of Web services into workflows, however, these large scale datasets can be processed systematically enabling detailed information to be gathered, published and subsequently re-investigated, enhancing the possibility of bridging the gap between genotype and phenotype using pathways.

So far this investigation has highlighted the issues facing the manual analysis of microarray and QTL data, and how automated approaches provide a systematic means to investigate genotype-phenotype correlations. We were able to illustrate how the large scale analysis of microarray gene expression and quantitative trait data, investigated at the level of biological pathways, enables links between genotype and phenotype to be successfully established.

An example of a workflow used to gather pathway information for candidate genes from a QTL region is shown below. These workflows have so far been applied to two different genotype-phenotype problems:

  • The study of resistance to African Trypanosomiasis in mice, infected with Trypanosoma congolen
  • The study of immunological effects and parasite expulsion in mice infected with Trichuris muris

Using this systematic, pathway driven approach, we were able to successfully identify a candidate gene and biological pathway believed to be strongly associated with resistance to African Trypanosomiasis.

The workflow for performing this analysis is available on myExperiment.

Publications

Articles and papers about the success of Taverna for trypanosomiasis (sleeping sickness) research.

Hide

The identification of genes linked to Graves disease

Graves Disease Scenario

The aim of this scenario is to identify and characterise genes which are located in regions on human chromosomes which show linkage to Graves disease (GD) (shown in figure below). GD is an autoimmune disease of the thyroid in which the immune system of an individual attacks cells in the thyroid gland resulting in hyperthyroidism. This is caused by the stimulation of the thyrotrophin receptor by thyroid-stimulating autoantibodies secreted by lymphocytes of the immune system.

Graves Disease Scenario

Affymetrix microarray studies

The GD candidate genes were identified by microarray analysis. Affymetrix U95A arrays were probed with RNA extracted from CD4 positive lymphocytes from four GD patients and four healthy controls. The four GD microarray datasets were then compared to the four control datasets using the Affymetrix data mining tool to identify differentially expressed genes.

Annotation Pipeline

Over 50 genes were found to be differentially-expressed in CD4 positive lymphocytes from GD patients. In order to understand why these genes were expressed in lymphocytes from GD patients but not in healthy individuals, the GD biologist would like to use myGrid to query public databases such as EMBL, GO, HGVBASE and MEDLINE to view information about gene structure and function, chromosome location, the presence of single nucleotide polymorphisms (SNPs), expression control features and association with other genetic diseases. The experimental conditions and diseases in which the expression of the candidate genes are significantly altered also need to be identified from OMIM.

Genotype Assay Design System

SNPs are small (single base pair) changes genetic variations which are found in the genome amongst individuals. The differential expression of the candidate genes in GD individuals may be due or related to the presence of SNPs associated with GD. The GD biologist is interested in identifying and determining the frequency of those SNPs which are found in her GD patients.

Restriction fragment length polymorphism (RFLP) assays are developed to genotype SNPs in her candidate genes. A region flanking either side of the SNP is amplified using polymerase chain reaction (PCR). The amplified PCR product is digested with a suitable restriction enzyme (i.e. one that will cut at one SNP allele and not the other) and the products are run on agarose gels to view product size and determine the genotype.

The GD biologist would like to use myGrid to:

  1. Query databases to retrieve SNP information associated with candidate genes.
  2. Aid in the design of primers (bits of DNA which signify the start and end points of the section of the DNA sequence which she wants to amplify) for the PCR experiment.
  3. Select the restriction enzyme that is specific to a particular SNP for the RFLP experiment.

3D Protein Structure & effect of coding SNP on protein active site

Any SNPs occurring in the coding regions of a candidate gene may potentially give rise to a change in the amino acid sequence of the protein encoded by the gene. The GD biologist would like to use myGrid to:

  1. Query a protein structure database, e.g. PDB or MSD, to determine whether a structure of the protein encoded by her candidate gene is available. If so, view the protein structure to study how it relates to the function of the protein.
  2. Obtain information about the protein, e.g. its function and functional domains, by querying SWISS-PROT and InterPro. Use Sheffield’s AMBIT Web service to retrieve information about an active site whose characteristics may be altered due to the presence of a coding SNP which has affected a change in the amino acid sequence of the protein where the active site is encoded.

The workflow for the Graves Disease analysis is published on myExperiment.

Publications

Articles and papers about the success of Taverna for Graves Disease research.

Hide

The characterisation of genes associated with Williams-Beuren syndrome

Williams–Beuren Syndrome (WBS) is a rare, sporadically occurring microdeletion disorder caused by a 1.5 Mb deletion located in chromosome band 7q11.23. It is a complex, multisystem genetic disorder characterised by a complex phenotype of physical and behavioural attributes.

The region most commonly deleted in WBS is approximately 1.5MB and typically causes the deletion of 24 genes. This region is flanked by 320-500KB of highly repetitive sequence. The repetitive and complex nature of which makes it difficult to sequence and difficult to map. Consequently, this region contains gaps in the genomic sequence and could contain genes, pseudogenes or regulatory elements that contribute to WBS. In order to fully understand the pathology of WBS and to determine genotype to phenotype correlations, a complete and comprehensive map of the WBS region is required.

The aim of the project is to close the genomic gaps in the WBS region and characterise any genes or regulatory elements that are discovered. Taverna workflows were used to automate the time-consuming and repetitive series of analyses required to achieve this objective.

Analyses

The sequencing effort in the human genome is a continuous process. Sequencing over gapped regions is ongoing. As new sequence is produced, it can be compared to the known sequence surrounding the gaps to determine any overlap. If there are sequences with overlap, these can be investigated further to characterise genes and extend the mapped region.

This type of analysis involves the use of multiple services at multiple sites, for example, BLAST for similarity searches, GenBank to retrieve new sequence data and RepeatMasker to mask repetitive DNA sequence regions. For gene characterisation, gene finding tools need to be used, such as GenScan, followed by functional motif identification tools, such as, signalP and pscan, after potential genes have been translated into amino acid sequences.

Advantages of workflows for WBS analysis

The WBS analyses described require intensive input from the bioinformatician. Results from one analysis must be cut-and-pasted into the input for the next. Reformatting is often required between analyses, making the process time-consuming and the mundane, repetitive nature of the exercise makes it prone to human error.

Automating the WBS analyses using myGrid workflows reduces these problems. Scheduling of workflow services to run in series means that the bioinformatician is free to do other research, perhaps running other workflows, whilst the experiment is running.
The careful capture of provenance information during the experiment invocation and the ability to capture results and semantic details of experiments in the myGrid Information Model and KAVE (Knowledge Annotation and Verification of Experiments) also provide great advantages in data handling.

Results

Performing a single WBS analysis manually can take anywhere between 1 and 2 weeks. Performing the same analysis using myGrid can reduce this time to a matter of hours.

Figure below shows the results of 4 workflow cycles (approximately 10 hours). The gapped region in this case contained a complement of known genes. All were identified correctly and their relative map positions in the region were able to be determined, refining the knowledge of the WBS region.

Williams-Beuren analysis

Publications

Articles and papers about the success of Taverna for Williams-Beuren syndrome research.

Hide

In the arts

The composition of music using Web services for synthesis

John Ffitch and others at the University of Bath have developed a service-oriented composition environment for music. The environment consists of three main components:

  • A suite of synthesis web services that incorporate a selection of the basic atomic components required for sound creation and processing
  • A tool that gives a description of music in terms of the construction of the instruments (parameters, connections) and the score they will play on
  • An “environment of use” that allows the connection and enactment of the synthesis services

Both Triana and Taverna have been tested as the environment of use.

This work is described in the paper Composition with Sound Web Services and Workflows by Ffitch et al.

Hide

In education (as well as for teaching purposes)

The creation of courses from individual learning activities and resources

Carsten Ullrich from the Shanghai Jiaotong University and others have used Taverna to integrate VIACIPA (a Web-based digital library of multimedia objects) and Paigos (a course generator) in order to formalize and automate the creation of learning courses based upon specific user requirements.

The work is detailed in the paper Multimedia-Learning in a Life Science Workflow Environment by Ullrich et al.

Hide

In bioinformatics

The measuring of enzyme characteristics of yeast

The Manchester Centre for Integrative Systems Biology is in the process of measuring the kinetic and binding constants associated with enzyme reactions in metabolic pathways in the yeast, Saccharomyces cerevisiae. Quantitative models of these metabolic pathways are being integrated with transcriptomic, proteomic and metabolomic data by workflows that have been constructed and enacted using Taverna.

A poster by Peter Li describing the systems biology workflows was presented at the International Conference of Systems Biology 2006 and he has written several publications about the work.

Many of the workflows make use of Systems Biology Markup Language (SBML) as described in the paper Automated manipulation of systems biology models using libSBML within Taverna workflows by Li et al.

An example workflow using SBML is available on myExperiment.

Hide

The integration of plant genome resources

As part of the PLANET (A Network of European Plant Databases) project, URGI (Unité de Recherche Génomique-Info) developed BioFloWeb.

BioFloWeb is a stand-alone Web application processing Web services. A given user can choose among predefined workflows or define their own with Taverna. BioFloWeb has been demonstrated for retrieval of information about Arabidopsis genes from several European databases.

BioFloWeb makes extensive use of BioMoby services.

Hide

The annotation of genomes

A collaboration between Tom Oinn from the myGrid team and Anders Lanzen, Svenn Helge Grindhaug and Pal Puntervoll from the University of Bergen, Norway, has produced an interactive genome annotation pipeline.

Sequencing, characterising and annotating a genome are the first steps to understanding its function. Important stages in this include gene prediction, comparative genomics and function prediction of genes and gene products. With workflows all of these stages can be automate, requiring little human interaction. However, manual inspection can be required at certain points in the process.

Articles and papers about the success of Taverna for genome annotation are available on-line.

Hide

The examination of gene expression from MicroArray data using R

Ingo Wassink developed the R plugin for Taverna under the BioRange project.

Workflows were developed by Peter Li and others, including Ingo, to perform statistical analysis of microarray data in order to study gene expression.

Some of these workflows are available on myExperiment.

The paper “Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data” by Li et al. describes some of this work.

Hide