GeneTrail2 RESTful API

GeneTrail2 is fully scriptable via a RESTful API. This allows our users to easily process larger enrichment studies in an automated fashion or to integrate GeneTrail2 into existing tools. As the API solely relies on standard HTTP requests no special libraries or software is required and bindings for any programming language can be created. In the following we will introduce the basic concepts needed for working with the API. If you are looking for the documentation of all implemented methods see our API reference.

Introduction

RESTful API represent resources as URLs on the server. Actions on the URLs are usually conducted using the standard HTTP verbs GET, POST, PUT, and DELETE. For example GeneTrail2 is focused on the concept of Sessions, Jobs, and Resources. Sessions are a collection of Resources such as score lists, expression matrices, categories, etc. They are usually produced by executing a Job. For obtaining a new Session object in GeneTrail2 we can use the following request:

GET /api/session

This might create the following JSON formatted output:

{ "session": "116654b3-7b2e-489b-ab66-2a377b9928c7" }

Using this session identifier we can now upload files, start computations, and display results. If a session is no longer needed we can delete it and all its contents by issuing the request

DELETE /api/session/116654b3-7b2e-489b-ab66-2a377b9928c7

For displaying a resource we can use the following request:

GET /api/resource/988?session=b0e9a4aa-345d-45f1-bc37-0e059ccd907c

Here session=b0e9a4aa-345d-45f1-bc37-0e059ccd907c specifies the session from which the resource should be retrieved and 988 is the identifier of the Resource object. The generated response looks like this:

{
  "id": 988,
  "session": "b0e9a4aa-345d-45f1-bc37-0e059ccd907c",
  "createdBy": "max-mean",
  "organism": 9606,
  "comment": "",
  "metadata": {
    "significance": 0.05,
    "input_file": 985,
    "parameters": {
      "significance": "0.05",
      "adjustment": "benjamini_hochberg",
      "minimum": "3",
      "maximum": "500",
      "permutations": "1000000",
      "input_file": "985",
      "adjustSeparately": "true",
      "algorithm": "max-mean"
    },
    "warnings": [],
    "algorithm": "max-mean"
  },
  "shared": false,
  "intermediate": false,
  "normalized": true,
  "displayName": "mRNA - Blastemal vs. Non-Blastemal - Max-Mean",
  "mediaType": "application/zip",
  "type": "Enrichment",
  "creationDate": 1430830959361,
  "identifier": "Gene-Symbol",
  "modificationDate": 1431681088991,
  "algorithm": "max-mean",
  "pipeline": {}
}

Example

So how does a script using the GeneTrail2 API in practice? Suppose you want to compute multiple enrichments from a matrix of gene expression values. For accomplishing this we will write a Python script. We start with the main procedure that calls some helper functions that do the actual work.
import json

# Load method definitions
from genetrail2 import *
# Load assignment of samples into groups
from dataGroups import *

# Obtain a session
key = getSession()

# Upload the input data to the server
matrixId = uploadFile(key, 'mrnaAllSamples.tsv')['id']

# Compute scores for the input data and the
# data groups using the shrinkage-t-test
# The first call will only create the job object
# on the server, but will not yet compute anything.
setupScoring(key, 'independent-shrinkage-t-test',
  file1 = matrixId,
  sg = json.dumps(groups['sg']),
  rg = json.dumps(groups['rg'])
)

# Run the actual computation
scores = runJob(key)['scores']['id'];

# Create a list of categories for which we
# want to compute our enrichments
categories = [
  '9606-gene-go-biologicalprocess',
  '9606-gene-kegg-pathways',
  '9606-gene-reactome-pathways',
  '9606-gene-pfam-proteinfamilies',
]

# Create and run the job for the enrichment.
# We use the GSEA algorithm here.
setupEnrichment(key, 'gsea', scores, categories)
result = runJob(key)['enrichment']['id']

# Download and store the results
downloadResult(key, result, 'mrnaAllSamples.gsea.zip')

You can download the example input file here. The dataGroups.py script can be downloaded here.

The methods getSession, uploadFile, setupScoring, setupEnrichment, runJob, and downloadResult are defined in the genetrail2 module:

import urllib.request
from http.client import HTTPConnection

import json
import time
import os

# Set some constants such as the server URL
baseurl = "genetrail2.bioinf.uni-sb.de"
basepath= ""

# Open a connection to the server
con = HTTPConnection(baseurl)
con.connect()

# Helper method that parses the results from the server
# Valid results are formatted as JSON strings.
def handleResponse(response):
    message = response.read()
    try:
        result = json.loads(message.decode("utf-8"));
    except ValueError:
        raise ValueError("Unexpected server response from server: " + str(message))

    return (result, response.status);

# A wrapper for GET methods
def doGet(endpoint):
    headers = {
        "Content-type": "application/x-www-form-urlencoded",
        "Accept": "application/json"
    }

    con.request("GET", basepath + endpoint)

    return handleResponse(con.getresponse())

# A wrapper for POST methods
# This method can places arbitrary key-value arguments
# into the request body.
def doPost(endpoint, **kwargs):
    headers = {
        "Content-type": "application/x-www-form-urlencoded"
    }

    data = urllib.parse.urlencode(kwargs)

    con.request("POST", basepath + endpoint, data, headers)

    return handleResponse(con.getresponse())

# Create a new session
def getSession():
    response, code = doGet('/api/session')

    if code != 200:
        raise ValueError("Unexpected server response. Unable to obtain API key.")

    print("Key is: " + response["session"])

    return response["session"]

# Upload a file to the server
def uploadFile(key, path):
    f = open(path, 'r')
    content = f.read()
    f.close()

    # This uses the 'application/x-www-form-urlencoded' mime type
    # for requests. For large file the multipart encoding may be more efficient
    # Additionally set the displayName of the file to the on disk file name
    # When browsing the results via the Web GUI this name will be shown for the
    # uploaded file.
    res, status = doPost('/api/upload?session=' + key,
        value = content,
        displayName = os.path.basename(path)
    )

    if res["status"] != "success":
        raise ValueError("Error during upload: " + res["message"])

    return res["results"]["result"]

# Setup an enrichment computation with the most common parameter
# settings.
def setupEnrichment(key, method, res, categories, **kwargs):
    res, status = doPost('/api/job/setup/%s?session=%s' % (method, key),
        significance=0.05,
        adjustment="benjamini_hochberg",
        categories=json.dumps(categories),
        minimum=3,
        maximum=500,
        adjustSeparately=True,
        input=str(res),
        **kwargs
    )

    if res["status"] != "success":
        raise ValueError("Could not setup service: " + res["message"])

# Create a scoring job
def setupScoring(key, method, **kwargs):
    res, status = doPost('/api/job/setup/scoring?session=%s' % key,
        method = method,
        **kwargs
    )

# Execute the current job. This method polls for the job status
# every two seconds. When the computation terminated or was aborted
# the method exits.
def runJob(key):
    doGet('/api/job/start?session=' + key)
    while True:
        time.sleep(2)
        # Has the computation completed yet?
        res, code = doGet('/api/job/query?session=' + key)
        if res["status"] == 'status':
            print(res["message"])
        elif res["status"] == 'success':
            return res["results"]
        else:
            raise ValueError("Unexpected status during computation: '" + res["message"] + "'")

# Download a resource given its identifier
def downloadResult(key, res, path):
    url = "http://%s%s/api/resource/%s/download/?session=%s" % (baseurl, basepath, str(res), key)
    urllib.request.urlretrieve(url, path)