Go back



Ondex workflow module development manual

[Still to write: What is an Ondex workflow? Mention it's open source.]

Prerequisites and requirements

It is assumed that the reader of this document possesses appropriate prior knowledge about writing Java 1.6 programs, as well as the usage of the tools Subversion and Maven2.

To follow the steps described in this manual it is required to have an installation of Sun JDK1.6, the latest version of Subversion and Maven2 ready on a computer.

Preparations

The following steps are only required until the Ondex workflow engine is deployed to the Maven repository at Rothamsted Research.

  1. Check out the Ondex workflow engine from sourceforge.net using the command:
    svn co https://ondex.svn.sourceforge.net/svnroot/ondex/trunk/ondex_parent ondex_parent
  2. Change into the resulting directory ondex_parent and compile the contents using the command:
    mvn install

Your local maven repository now contains all required Ondex libraries.

Creating a new workflow module

  1. Create a new maven project, by creating a new folder containing a pom file. Specify the following artifact as the pom's parent:
    • Group ID: net.sourceforge.ondex
    • Artifact ID: modules
    • Version: 0.0.1-SNAPSHOT
  2. Add the following repository reference to the pom:
  3. Create the standard maven java direcory in your new folder. (src/main/java)
  4. Create a new package inside src/main/java:
    • net.sourceforge.ondex

Using external libraries

As a general rule one can make use of any external library by defining it as a dependency in the pom file. During compile time all dependencies will be automatically downloaded and merged into the resulting plug-in jar file.

If a required library is already used by the Ondex workflow engine, it is of course not necessary to merge it into the jar file. In this case, simply set its dependency scope to provided. Here is a list of all libraries that are already in use:

  • blowfish:blowfishj:2.14
  • commons-cli:commons-cli:1.2
  • com.sleepycat:je:3.3.75
  • com.thoughtworks.xstream:xstream:1.3.1
  • fastutil:fastutil:5.1.5
  • log4j:log4j:1.2.12
  • net.sourceforge.ondex:oxl-module:0.0.1
  • net.sourceforge.ondex:tools:0.0.1
  • org.apache.lucene:lucene-core:2.4.0
  • org.codehaus.staxmate:stax2:2.1
  • org.codehaus.woodstox:stax2-api:3.0.1
  • org.codehaus.woodstox:woodstox-core-lgpl:4.0.3
  • stax:stax-api:1.0.1
  • xerces:xercesImpl:2.9.1
  • xml-apis:xml-apis:1.3.04
  • xpp3:xpp3_min:1.1.4c

Creating a workflow component

There are several different types of workflow components in Ondex:

  • Parsers: A parser is a workflow component that creates the contents of an Ondex graph, according to information in a set of files. It usually targets a specific database or file format. The KEGG parser, or the SBML parser are only a few examples.
  • Mapping: Mappings create relations between different parts of an Ondex graph. For example, the 'accession-based' mapping usually maps concepts from different data sources to one another. The 'blast-based' mapping creates relations between sequence-bearing concepts according to their similarity.
  • Filters: A filter extracts a subset from the graph. For example the all-pairs shortest path filter yields all concepts and the subset of relations that are part of shortest paths between any two concepts.
  • Transformers: A transformer is a workflow component that transforms the graph from one configuration into another. For example: The relation collaps transformer merges concepts that are connected by a certain relation.
  • Statistics: This special kind of workflow component runs statistical analyses on the graph and outputs them to the file system. For example: The GOA quality statistic outputs a specificity analysis over GeneOntology annotations.

For more information about the different workflow components, please see the plugins' documentation starts page 73 of the Ondex user guide.

[Still to do: Add class diagram.]

Common technical aspects

All workflow components have some important common technical aspects.

Type and ID

A component's type and ID are defined by its fully qualified class name. This is accomplished by obeying the following naming convention:

net.sourceforge.ondex.<lctype>.<id>.<Uctype>

where <lctype> represents the component's type name in lower case, <Uctype> its upper case equivalent, and <id> the components identifier.

For example: The class net.sourceforge.ondex.parser.foobar.Parser is a parser component identified as 'foobar'.

In future these restrictive definitions will become obsolete, as it is planned to introduce OSGi technology to solve this issue.

Name and version info

Each workflow component features two methods called String getName() and String getVersion().

The getName() method allows you to state a full name for your component. For example: 'PSI Molecular Interaction Format Parser' would be the name for the Parser with ID psimi.

In the getVersion() method you can specify a version tag, usually the date of the last update.

Arguments

All workflow components can require arguments. It is the choice of the developer whether or not and if so, how many arguments it requires. Argument requirements are defined using objects of the type 'ArgumentDefinition', which will be explained in the next section.

Argument Definitions

Ondex's workflow API provides a set of many different argument definition types, all of which can be found in the the package net.sourceforge.ondex.args in the artifact net.sourceforge.ondex:workflow-api.

  • String Argument Definition
  • Integer Range Argument Definition
  • Float Range Argument Definition
  • Boolean Argument Definition
  • Sequence-type Argument Definition
  • File Argument Definition
  • URL Argument Definition

If an argument you require does not suit any of these Argument Definitions, it is possible to write a new one. Create the package net.sourceforge.ondex.args in your project and create a new class that implements the interface net.sourceforge.ondex.args.ArgumentDefinition.

Specifying Argument Definitions

Each of these argument definitions can be configured with certain properties:

  • The argument's name
  • The argument's description
  • Its default value
  • Whether it is mandatory
  • Whether multiple instances of the argument are allowed

Example:

String argname = "Query";
String argdesc = "The query string";
 
ArgumentDefinition<?>[] args = new ArgumentDefinition<?>[]{
  //                           name   , description, required, defaultValue, multipleInstancesAllowed
  new StringArgumentDefinition(argname, argdesc    , true    , "SELECT *"  , false)
}

Each workflow component features a method called ArgumentDefinition<?>[] getArgumentDefinitions() which can be used to return a set of definitions like the one above. If no arguments are required, simply let it return an empty array like this:

return new ArgumentDefinition<?>[0];
Accessing arguments

To access the arguments that were specified by the user, each workflow component possesses a method getArguments(), which returns an object of the type net.sourceforge.ondex.AbstractArguments.

This object can be queried for the argument's values. Unique arguments (multipleInstancesAllowed = false) can be accessed using the method Object getUniqueValue(String name). Lists of values for non-unique arguments are returned by the method List<Object> getObjectValueList(String name). The parameter 'name' refers to the name that was given to the corresponding Argument Definition. Depending on the type of workflow component, the argument access field can have different names. These names are introduced in the respective sections below.

Example:

String query = (String) getArguments().getUniqueValue("Query");

Accessing the Ondex graph

The Ondex graph that the workflow component is working on is referenced as a field called ONDEXGraph graph. To find out how to access and manipulate the Ondex graph, please refer to the Ondex JavaDoc and the Ondex graph API manual.

Important: To ensure that you use the Ondex graph in a way that is consistent with other Ondex applications, please ensure that you follow the Ondex semantics guidelines.

Lookup functions (validators)

Requesting lookup functions

Lookup functions (called validators in Ondex) are very similar to workflow components, and share the same identifier conventions. Thus, you can indicate your need of assistance from any known validators to the workflow engine by stating their identifiers. This is done by implementing the method String[] requiresValidators(), where the return value is an array of the IDs of the required validators. Every workflow component features this method. If you do not require any lookup functions, simply return an empty String array:

return new String[0];
Calling lookup functions

All Ondex validators are accessible through the static field net.sourceforge.ondex.config.ValidatorRegistry.validators, which is of the type HashMap<String,AbstractONDEXValidator>. The workflow engine provides all validators requested in the requiresValidators() method (see above) in this hash map. The validator's id is used as the hash key:

Validator taxLookup = ValidatorRegistry.validators.get("taxonomy");

To use the validator simply call its validate(Object o) method. It will return the converted Object.

String ncbiTaxID = (String) taxLookup.validate("yeast");

Indexing environment

The Ondex workflow engine can provide a Lucene based indexing environment for fast searches if needed.

Requesting search environment initialization

You can indicate that your component makes use of the search environment feature. Every workflow component possesses a method boolean requiresIndexedGraph(). Simply set its return value to true if you want to activate the search index.

Accessing the search environment

If you have instructed the workflow engine to create a search index over the current graph by returning true for requiresIndexedGraph() an Apache Lucene Environment will be provided for you. To access it use the static field net.sourceforge.ondex.config.LuceneRegistry.sid2luceneEnv. It is a HashMap<Long,LuceneEnv> that uses the graph's super ID (SID) as key. Thus you access it as follows:

LuceneEnv env = LuceneRegistry.sid2luceneEnv.get(graph.getSID());

This field provides the following methods to search inside the graph:

  • scoredSearchInConcepts(Query q)
  • scoredSearchInRelations(Query q)

To learn more about how to use this feature refer to the Ondex Javadoc and the Apache Lucene documentation.

Writing a parser

Creation

  1. Create a new subpackage:
    • net.sourceforge.ondex.parser.<id>
  2. Create a new Java class called Parser inside that package that extends the class net.sourceforge.ondex.parser.AbstractONDEXParser and override its abstract methods.
    Your class should now look like this:
 package net.sourceforge.ondex.parser.myparser;
 
 import net.sourceforge.ondex.args.ArgumentDefinition;
 import net.sourceforge.ondex.parser.AbstractONDEXParser;
 
 public class Parser extends AbstractONDEXParser {
 
	@Override
	public boolean readsDirectory() {
		return false;
	}
 
	@Override
	public boolean readsFile() {
		return false;
	}
 
	@Override
	public ArgumentDefinition<?>[] getArgumentDefinitions() {
		return null;
	}
 
	@Override
	public String getName() {
		return null;
	}
 
	@Override
	public String getVersion() {
		return null;
	}
 
	@Override
	public String[] requiresValidators() {
		return null;
	}
 
	@Override
	public void start() throws Exception {
 
	}
 
 }

Input specification

You will want to specify whether your parser will read a single file, a directory or both. Simply set the return values for the methods boolean readsDirectory() and boolean readsFile() accordingly.

Implementation

In addition to the general argument values, the getArguments() method also allows access to the input file and/or input directory. This is done using the method getArguments().getInputDir() and getArguments().getInputFile(), respectively.

Writing a Mapping

Creation

  1. Create a new subpackage:
    • net.sourceforge.ondex.mapping.<id>
  2. Create a new Java class called Mapping inside that package that extends the class net.sourceforge.ondex.mapping.AbstractONDEXMapping and override its abstract methods.
    Your class should now look like this:
package net.sourceforge.ondex.mapping.mymapping;
 
import net.sourceforge.ondex.args.ArgumentDefinition;
import net.sourceforge.ondex.mapping.AbstractONDEXMapping;
 
public class Mapping extends AbstractONDEXMapping {
 
	@Override
	public ArgumentDefinition<?>[] getArgumentDefinitions() {
		return null;
	}
 
	@Override
	public String getName() {
		return null;
	}
 
	@Override
	public String getVersion() {
		return null;
	}
 
	@Override
	public boolean requiresIndexedGraph() {
		return false;
	}
 
	@Override
	public String[] requiresValidators() {
		return null;
	}
 
	@Override
	public void start() throws Exception {
 
	}
}

Writing a filter

Creation

  1. Create a new subpackage:
    • net.sourceforge.ondex.filter.<id>
  2. Create a new Java class called Filter inside that package that extends the class net.sourceforge.ondex.mapping.AbstractONDEXFilter and override its abstract methods.
    Your class should now look like this:
package net.sourceforge.ondex.filter.myfilter;
 
import net.sourceforge.ondex.args.ArgumentDefinition;
import net.sourceforge.ondex.core.ONDEXConcept;
import net.sourceforge.ondex.core.ONDEXGraph;
import net.sourceforge.ondex.core.ONDEXRelation;
import net.sourceforge.ondex.core.ONDEXView;
import net.sourceforge.ondex.filter.AbstractONDEXFilter;
 
public class Filter extends AbstractONDEXFilter {
 
	@Override
	public void copyResultsToNewGraph(ONDEXGraph exportGraph) {
 
	}
 
	@Override
	public ONDEXView<ONDEXConcept> getVisibleConcepts() {
		return null;
	}
 
	@Override
	public ONDEXView<ONDEXRelation> getVisibleRelations() {
		return null;
	}
 
	@Override
	public ArgumentDefinition<?>[] getArgumentDefinitions() {
		return null;
	}
 
	@Override
	public String getName() {
		return null;
	}
 
	@Override
	public String getVersion() {
		return null;
	}
 
	@Override
	public boolean requiresIndexedGraph() {
		return false;
	}
 
	@Override
	public String[] requiresValidators() {
		return null;
	}
 
	@Override
	public void start() throws Exception {
 
	}
 
}

Output

ONDEXView and ONDEXBitSet

Filters do not manipulate the graph directly, but rather provide a view on the data. This is done using the class ONDEXView, which contains a bitset over the IDs of the elements concerned. This bitset can be any implementation of the interface ONDEXBitSet, such as DefaultBitSet or SparseBitSet. You can find detailed information about this in the graph API manual.

Here is an example snippet of how to construct a new ONDEXView over some concepts:

DefaultBitSet bitset = new DefaultBitSet();
for (ONDEXConcept concept : conceptList) {
    bitset.set(concept.getId());
}
ONDEXView<ONDEXConcept> conceptView = new ONDEXView<ONDEXConcept>(graph, ONDEXConcept.class, bitset);
Output methods

The filter output is stated using three special methods:

  • ONDEXGraph copyResultsToNewGraph(ONDEXGraph)
  • ONDEXView<ONDEXConcept> getVisibleConcepts()
  • ONDEXView<ONDEXRelation> getVisibleRelations()

It is advisable to keep ONDEXBitSets of the filtered concepts and relations as private fields in your Filter class. This makes implementing the above methods much easier.

Writing a transformer

Creation

  1. Create a new subpackage:
    • net.sourceforge.ondex.tranformer.<id>
  2. Create a new Java class called Transformer inside that package that extends the class net.sourceforge.ondex.transformer.AbstractONDEXTransformer and override its abstract methods.
    Your class should now look like this:
package net.sourceforge.ondex.transformer.mymapping;
 
import net.sourceforge.ondex.args.ArgumentDefinition;
import net.sourceforge.ondex.transformer.AbstractONDEXTransformer;
 
public class Transformer extends AbstractONDEXTransformer {
 
	@Override
	public ArgumentDefinition<?>[] getArgumentDefinitions() {
		return null;
	}
 
	@Override
	public String getName() {
		return null;
	}
 
	@Override
	public String getVersion() {
		return null;
	}
 
	@Override
	public boolean requiresIndexedGraph() {
		return false;
	}
 
	@Override
	public String[] requiresValidators() {
		return null;
	}
 
	@Override
	public void start() throws Exception {
 
	}
}

Writing a statistics method

Creation

  1. Create a new subpackage:
    • net.sourceforge.ondex.statistics.<id>
  2. Create a new Java class called Statistics inside that package that extends the class net.sourceforge.ondex.statistics.AbstractONDEXStatistics and override its abstract methods.
    Your class should now look like this:
package net.sourceforge.ondex.statistics.mymapping;
 
import net.sourceforge.ondex.args.ArgumentDefinition;
import net.sourceforge.ondex.statistics.AbstractONDEXStatistics;
 
public class Statistics extends AbstractONDEXStatistics {
 
	@Override
	public ArgumentDefinition<?>[] getArgumentDefinitions() {
		return null;
	}
 
	@Override
	public String getName() {
		return null;
	}
 
	@Override
	public String getVersion() {
		return null;
	}
 
	@Override
	public boolean requiresIndexedGraph() {
		return false;
	}
 
	@Override
	public String[] requiresValidators() {
		return null;
	}
 
	@Override
	public void start() throws Exception {
 
	}
}

Implementation

Arguments

Like parsers, statistics methods allow the definition of input files and/or directories. Hence, in addition to the general argument values the getArguments() method also allows access to the input file and/or input directory. This is done using the method getArguments().getInputDir() and getArguments().getInputFile(), respectively.

Compiling and running the new module

In order to build the new workflow module proceed as follows:

  1. Go back to your main project directory (that contains your pom.xml file).
  2. Build the project using the command:
    mvn install
  3. Once your project has been built successfully you will find the Ondex plugin jar file in your project's target/ directory under the name <artifactID>-<version>-jar-with-dependencies.jar (where <artifactID> and <version> are the ID and version you stated in your pom.xml file.
  4. Copy the jar file into the plugins/ directory of your local Ondex installation. You can now run a workflow using your new components.

Go back