Taverna has now moved to the Apache Software Foundation. For updated information, see Apache Taverna (incubating).

Chemistry Web Services

This document describes a set of chemistry services provided by ChemSpiderChEBI and PubChem that can be used for the construction of chemistry workflows from the Taverna Workbench.

Example workflows contained in the document show how some of the operations from the services in the set can be invoked. They can also be downloaded as part of the Chemistry Workflows pack at myExperiment.

ChemSpider

ChemSpider is a chemistry search engine that has been built with the intention of aggregating and indexing chemical structures and their associated information into a single searchable repository and making it freely available to everybody.

ChemSpider provides several Web services, and these are listed below together with their WSDL locations.

InChi Web service

WSDL: http://www.chemspider.com/InChI.asmx?WSDL
BioCatalogue icon In BioCatalogue:
http://www.biocatalogue.org/services/2164

The InChI Web service provides operations to manipulate InChI Strings and InChIKeys, including the conversion to and from the MOL file format, validity checking of InChI identifiers, and searching ChemSpider using InChI inputs.

Some of the Web service’s operations listed below require a “security token”. A security token can be obtained by completing the registration process for ChemSpider at http://www.chemspider.com/Register.aspx.

The following operations are contained within the InChI Web service:

  • CSIDToMol
    Converts a ChemSpider identifier to a MOL file. A security token is required to get access to this service.
    Example workflow
  • GenerateInChI
    Generates an InChI string for a given chemical represented by its SMILES string, SDF or MOL file.
    Example workflow
  • GenerateInChIInfo
    Returns information relating to the InChI string for a given chemical compound.
    Example workflow
  • GenerateInChIKey
    Returns a hashed InChI key which is a fixed length (25 character) condensed digital representation of the input InChI string that is not human-understandable.
    Example workflow
  • InChIKeyToCSID
    Converts an InChI key to a ChemSpider identifier.
    Example workflow
  • InChIToCSID
    Converts an InChI string to a ChemSpider identifier.
    Example workflow
  • InChIToInChIKey
    Converts an InChI string to an InChI key. This only works for 1.02b InChI strings.
    Example workflow
  • InChIToSMILES
    Converts an InChI string to a SMILES string. Uses OpenBabel internally to perform this operation.
    Example workflow
  • IsValidInChIKey
    Checks that an InChI key is valid. Works only for v1.02b InChI keys.
    Example workflow
  • MolToInChI
    Converts a MOL file into an InChI string (v1.02s).
    Example workflow
  • ResolveInChIKey
    This operation does not work at the moment. ChemSpider have been notified of the problem.
  • SMILESToInChI
    Converts a SMILES string to an InChI string. The result is returned as a  v1.02s InChI string.
    Example workflow

OpenBabel Web service

WSDL: http://www.chemspider.com/OpenBabel.asmx?WSDL

  • convert
    Converts a molecule represented in one format to another. For a list of valid format values, please visit openbabel. An empty string is returned in case of failure.
    Example workflow

Mass spec API Web service

WSDL: http://www.chemspider.com/MassSpecAPI.asmx?WSDL
BioCatalogue icon In BioCatalogue:
http://www.biocatalogue.org/services/2040

  • GetCompressedRecordsSdf
    Returns a SDF file containing records of compounds found by an asynchronous search operation. A security token associated with ‘Service Subscriber’ role is required to access this operation.
    Example workflow not available since a ‘Service Subscriber’ role is required by the user.
  • GetExtendedCompoundInfo
    Returns extended record details for a given ChemSpider identifier. A security token is required to access this service.
    Example workflow
  • GetExtendedCompoundInfoArray
    Returns an array of extended record details from an array of ChemSpider identifiers. A security token is required to access this service.
    Example workflow
  • GetRecordMol
    Returns a ChemSpider record in MOL format or an empty string in case of failure. The cacl3d parameter specifies whether 3D coordinates should be calculated before returning record data. A security token is required to access this service.
    Example workflow
  • GetRecordsSdf
    Returns a SDF file containing records found by an asynchronous search operation.  A security token associated with the ‘Service Subscriber’ role is required to access this operation.
    Example workflow not available since a Service Subscriber role is required by the user.
  • SearchByFormula
    Search ChemSpider compounds by molecular formula within a specified  list of datasources. This operation is deprecated and will be removed soon – use SearchByFormulaAsync instead.
    Example workflow not available since this operation is deprecated.
  • SearchByFormula2
    Search ChemSpider compounds by molecular formula.
    Example workflow not available since this operation is deprecated.
  • SearchByFormulaAsync
    Searches ChemSpider compounds by molecular formula within a specified datasources list. Security token is required to get access to this service.
    Example workflow
  • SearchByMass
    Searches ChemSpider compounds by mass +/- range within specified datasources list. This operation is deprecated and will be removed soon – use SearchByMassAsync instead.
    Example workflow not available since this operation is deprecated.
  • SearchByMass2
    Search ChemSpider compounds by mass +/- range.
    Example workflow not available since this operation is deprecated.
  • SearchByMassAsync
    Searches ChemSpider compounds by mass +/- range within a specified datasources list. A security token is required to access this service.

Spectra Web service

WSDL: http://www.chemspider.com/Spectra.asmx?WSDL

  • GetAllSpectraInfo
    Returns information for all open access spectra in ChemSpider
    Example workflow
  • GetCompoundSpectraInfo
    Returns information about spectra associated with a particular compound identified by the cmp_id parameter
    Example workflow
  • GetSpectrumInfo
    Returns information about a particular spectrum identified by its spc_id parameter
    Example workflow

Compound search Web service

WSDL: http://www.chemspider.com/Search.asmx?WSDL
BioCatalogue icon In BioCatalogue:
http://www.biocatalogue.org/services/1932

  • AsyncSimpleSearch
    Searches for molecules based on the entered search terms. The operation returns a transaction ID which can be used to access the status of the search and results. A security token is required to access this service.
    The result of this operation is a hash number which can be used to retrieve the actual results using the GetAsyncSearchResult operation below. Example workflow
  • CSID2ExtRefs
    Returns a list of external references (data sources) for a given compound.  A security token with the relevant role is required to access this service.
  • GetAsyncSearchResult
    Returns a list of identifiers found by the asynchronous search operation. A security token is required to access this service.
    Example workflow
  • GetAsyncSearchStatus
    Queries the asynchronous operation status. A security token is required to access this service.
    Example workflow
  • GetCompoundInfo
    Returns the record details (CSID, InChIKey, InChI, SMILES) of a molecule by its ChemSpider identifier. A security token is required to access this service.
    Example workflow
  • GetCompoundThumbnail
    Returns an image of a molecule’s 2D structure in PNG format. A security token is required to access this service.
    Example workflow
  • GetRecordDetails
    Returns a record details: CSID, InChIKey, InChI, SMILES. This operation is deprecated and will be removed soon – use GetCompoundInfo instead.
    No example workflow due to deprecation of this operation.
  • GetRecordImage
    Returns an image of a molecule’s structure in PNG format. This operation is deprecated and will be removed soon – use GetCompoundThumbnail instead.
    No example workflow due to deprecation of this operation.
  • Mol2CSID
    Searches for structures matching a given MOL file within a given range . Returns a list of ChemSpider identifiers associated with matching structures. A security token with the ‘specific’ role is required to access this service.
    No example workflow due to specific role required for the operation.
  • MolAndDS2CSID
    Searches for structures matching a given MOL file within the range specified by search options and within the specified list of datasources. Returns a list of ChemSpider identifiers. A security token with the ‘specific’ role is required to access this service.
    No example workflow due to specific role required for the operation.
  • SimpleSearch
    Performs a search using a given set of terms. Returns a list of ChemSpider identifiers. A security token is required to access this service.
    Example workflow
  • SimpleSearch2IdList
    Tries to find whatever is entered. Returns a list of ChemSpider IDs. This operation is deprecated and will be removed soon – use SimpleSearch instead.
    No example workflow due to deprecation of this operation.

Synonyms

WSDL: http://www.chemspider.com/Synonyms.asmx?WSDL

  • GetStructureSynonyms
    Returns synonym names for a given compound represented by its MOL file.
    Example workflow

ChEBI

Chemical Entities of Biological Interest (ChEBI) is a freely available database of molecular entities focused on ‘small’ chemical compounds. The term ‘molecular entity’ refers to any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex or conformer, identifiable as a separately distinguishable entity. ChEBI incorporates an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified.

ChEBI provides the following Web service with 7 operations.

WSDL: http://www.ebi.ac.uk/webservices/chebi/2.0/webservice?wsdl
BioCatalogue icon In BioCatalogue:
http://www.biocatalogue.org/services/2174

  • getLiteEntity
    Retrieves a list of “lite” entities containing only the ChEBI ASCII name and ChEBI identifier. The input parameters are a search string and a search category. If the search category is null then it will search under all fields. The search string accepts the wildcard character “*” and also unicode characters. A maximum 5000 entries can be retrieved at a time.
    Example workflow
  • getCompleteEntity
    Retrieves the complete record of a molecule including synonyms, database links and chemical structures, using the ChEBI identifier.
    Example workflow
  • getCompleteEntityByList
    Given a list of ChEBI accession numbers, retrieves the complete entity record associated with each accession number. The maximum size of a given list is 50.
    Example workflow
  • getOntologyParents
    Retrieves the ontology parents of an entity including the relationship type, using a ChEBI identifier.
    Example workflow
  • getOntologyChildren
    Retrieves the ontology children of an entity including the relationship type, using a ChEBI identifier.
    Example workflow
  • getAllOntologyChildrenInPath
    Retrieves the ontology children of an entity including the relationship type, using a ChEBI identifier.
    No example workflow available.
  • getStructureSearch
    Does a substructure, similarity or identity search using a query structure.
    Example workflow

PubChem

PubChem is a free database of chemical structures of small organic molecules and information on their biological activities provided by the National Center for Biotechnology Information (NCBI), part of the United States National Institutes of Health (NIH).

PubChem provides the following Web service with 28 operations.

WSDL: http://pubchem.ncbi.nlm.nih.gov/pug_soap/pug_soap.cgi?wsdl
BioCatalogue icon In BioCatalogue:
http://www.biocatalogue.org/services/2176

  • AssayDownload
    Given an assay key, prepares a file for download which contains an assay data table in the selected format. See the assay query section of the PUG service documentation (http://pubchem.ncbi.nlm.nih.gov/pug/pughelp.html) for more details on the supported formats. Compression is optional and defaults to gzip (.gz). Returns a download key. Asynchronous.
  • GetAssayColumnDescription
    Returns the description of a column (readout) in a BioAssay, which may be the outcome, score, or a TID from the given AID. Synchronous.
  • GetAssayColumnDescriptions
    Returns the description of all columns (readouts) in a BioAssay. Synchronous.
  • GetAssayDescription
    Returns the descriptive information for a BioAssay, including the number of user-specified readouts (TIDs) and whether a score readout is present. Optionally get version information. Synchronous.
  • GetDownloadUrl
    Given a download key, returns an FTP URL that may be used to download the requested file. Synchronous.
  • GetEntrezKey
    Given a list key, returns an Entrez history key (db, query key, and WebEnv) corresponding to that list. Synchronous.
    Example workflow
  • GetEntrezUrl
    Given an Entrez history key (db, query key, and WebEnv), returns an HTTP URL that may be used to view the list in Entrez. Synchronous.
    Example workflow
  • GetIDList
    Given a list key, returns the identifiers as an array of integers. Synchronous.
    Example workflow
  • GetListItemsCount
    Returns the number of IDs in the set represented by a given list key. Synchronous.
  • GetOperationStatus
    Given a key for any asynchronous operation, returns the status of that operation. Possible return values are: Success, the operation completed normally; HitLimit, TimeLimit: the operation finished normally, but one of the limits was reached (e.g. before the entire database was searched); ServerError, InputError, DataError, Stopped: there was a problem with the input or on the server, and the job has died; Queued: the operation is waiting its turn in the public queue; Running: the operation is in progress. Synchronous.
    Example workflow
  • GetStandardizedCID
    Given a structure key that has been processed by Standardize, returns the corresponding PubChem Compound database CID, or an empty value if the structure is not present in PubChem. Synchronous.
  • GetStandardizedStructure
    Given a structure key that has been processed by Standardize, returns the chemical structure in as SMILES or InChI strings. Synchronous.
  • GetStandardizedStructureBase64
    Given a structure key that has been processed by Standardize, returns the chemical structure as ASN, XML, or SDF, returned as a Base64-encoded string. Synchronous.
  • GetStatusMessage
    Given a key for any asynchronous operation, returns any system messages (error messages, job info, etc.) associated with the operation, if any. Synchronous.
  • IdentitySearch
    Searches PubChem Compound for structures identical to the one given by the structure key input based on a user-selected level of chemical identity: connectivity only, match isotopes and/or stereo, etc. The search may be limited by elapsed time or number of records found, or restricted to search only within a previous result set (given by a list key). Returns a list key. Asynchronous.
    Example workflow
  • InputAssay
    Specifies an assay table from a BioAssay AID. The table may be complete, concise, or include a ListKey-specified set of readouts (TIDs). By default, all tested substances are included, but can be restricted to a ListKey-specified set of SIDs or CIDs. Returns an assay key. Synchronous.
  • InputEntrez
    Configures an Entrez history key (db, query key, and WebEnv). Returns a list key. Synchronous.
  • InputList
    Configures a set of identifiers for a PubChem database, as an array of integers. Returns a list key. Synchronous.
  • InputListText
    Configures a set of identifiers for a PubChem database, as a simple string of integer values separated by commas and/or whitespace. Returns a list key. Synchronous.
  • InputStructure
    Configures a chemical structure as a simple (one-line) string, either SMILES or InChI. Returns a structure key. Synchronous.
    Example workflow
  • InputStructureBase64
    Configures a chemical structure in ASN.1 (text or binary), XML, or SDF format. The structure must be encoded as a Base64 string. Currently only single structures are supported. Returns a structure key. Synchronous.
    Example workflow
  • MFSearch
    Searches PubChem Compound for structures of a given molecular formula, optionally allowing elements not specified to be present. The search may be limited by elapsed time or number of records found, or restricted to search only within a previous result set (given by a list key). Returns a list key. Asynchronous.
  • ScoreMatrix
    Computes a matrix of scores from one or two lists of IDs (if one, the IDs will be self-scored), of the selected type and in the selected format. Compression is optional and defaults to gzip (.gz). Returns a download key. Asynchronous.
  • SimilaritySearch2D
    Searches PubChem Compound for structures similar to the one given by the structure key input, based on the given Tanimoto-based similarity score. The search may be limited by elapsed time or number of records found, or restricted to search only within a previous result set (given by a list key). Returns a list key. Asynchronous.
    Example workflow
  • Standardize
    Standardizes the structure given by the structure key input, using the same algorithm PubChem uses to construct the Compound database. Returns a structure key. Asynchronous.
  • SubstructureSearch
    Searches PubChem Compound for structures containing the one given by the structure key input, based on a user-selected level of chemical identity: connectivity only, match isotopes and/or stereo, etc. The search may be limited by elapsed time or number of records found, or restricted to search only within a previous result set (given by a list key). Returns a list key. Asynchronous.
    Example workflow
  • SuperstructureSearch
    Searches PubChem Compound for structures contained within the one given by the structure key input, based on a user-selected level of chemical identity: connectivity only, match isotopes and/or stereo, etc. The search may be limited by elapsed time or number of records found, or restricted to search only within a previous result set (given by a list key). Returns a list key. Asynchronous.