Ondex workflow module development manual

For Ondex version 0.1.3

Prerequisites and requirements

It is assumed that the reader of this document possesses appropriate prior knowledge about writing Java 1.6 programs, as well as the usage of the tools Subversion and Maven2.

To follow the steps described in this manual it is required to have an installation of Sun JDK1.6, the latest version of Subversion and Maven2 ready on a computer.

Preparations

The following steps are only required until the Ondex workflow engine is deployed to the Maven repository at Rothamsted Research.

  1. Check out the Ondex workflow engine from sourceforge.net using the command:
    svn co https://ondex.svn.sourceforge.net/svnroot/ondex/trunk/ondex_parent ondex_parent
  2. Change into the resulting directory ondex_parent and compile the contents using the command:
    mvn install

Your local maven repository now contains all required Ondex libraries.

Creating a new workflow module

  1. Create a new maven project, by creating a new folder containing a pom file. Specify the following artifact as the pom's parent:
    • Group ID: net.sourceforge.ondex
    • Artifact ID: modules
    • Version: 0.1.3-SNAPSHOT
  2. Add the following snapshot repository reference to your maven installation's settings.xml:
    • http://ondex.rothamsted.bbsrc.ac.uk/nexus/content/groups/public-snapshots
  3. Create the standard maven java direcory in your new folder. (src/main/java)
  4. Create a new Java package inside src/main/java

Dealing with Maven bugs

There are currently a few maven bugs that turn the creation of new workflow components into quite a hassle. To save you from unnecessary frustration we will list a few hacks here that help bypass these troubles.

Artefact version bug

Maven currently suffers from a bug that does not allow an artifact to have a version number that differs from its parent's version number. Thus your artifact must match Ondex's latest version number: 0.1.3-SNAPSHOT.

Parent dependency resolution bug

Another bug in maven currently prevents it from correctly resolving the download URL of parent dependencies. This means that your builds will fail unless you follow the instructions below:

  1. Follow this link in your browser. You will see a listing of all snapshot builds of the Ondex modules artifact. The file names will contain identifiers for the different snapshot builds, consisting of date, time and number of the build (For example, 20101019.002606-323 is for build number 323 from the 19/10/2010, 00:26:06).
  2. Scroll to the bottom of the page to find the latest build and copy its build identifier.
  3. In your pom file go to the version field of the parent declaration and replace the String "SNAPSHOT" with the build ID you just copied from the browser.
  4. Build your artifact using the command
    mvn install
    It should now succeed in downloading the required dependencies.
  5. After your build has succeeded, you can change the parent version field back to what it was before. It should now build just fine.

Using external libraries

As a general rule one can make use of any external library by defining it as a dependency in the pom file. During compile time all dependencies will be automatically downloaded and merged into the resulting plug-in jar file.

If a required library is already used by the Ondex workflow engine, it is of course not necessary to merge it into the jar file. In this case, simply set its dependency scope to provided. Here is a list of all libraries that are already in use:

Creating a workflow component

There are several different types of workflow components in Ondex:

For more information about the different workflow components, please see the workflow component documentation.

Common technical aspects

All workflow components have some important common technical aspects.

ID, name and version information

Each workflow component features three methods called String getId(), String getName() and String getVersion().

The getId() method is used to define the ID of your component, which will be used to invoke and execute it.

The getName() method allows you to state a full name for your component. For example: 'PSI Molecular Interaction Format Parser' would be the name for the Parser with ID psimi.

In the getVersion() method you can specify a version tag, usually the date of the last update.

Arguments

All workflow components can require arguments. It is the choice of the developer whether or not and if so, how many arguments it requires. Argument requirements are defined using objects of the type 'ArgumentDefinition', which will be explained in the next section.

Argument Definitions

Ondex's workflow API provides a set of many different argument definition types, all of which can be found in the the package net.sourceforge.ondex.args in the artifact net.sourceforge.ondex:workflow-api.

If an argument you require does not suit any of these Argument Definitions, it is possible to write a new one. Create the package net.sourceforge.ondex.args in your project and create a new class that implements the interface net.sourceforge.ondex.args.ArgumentDefinition.

Specifying Argument Definitions

Each of these argument definitions can be configured with certain properties:

Example:

String argname = "Query";
String argdesc = "The query string";

ArgumentDefinition<?>[] args = new ArgumentDefinition<?>[]{
  //                           name   , description, required, defaultValue, multipleInstancesAllowed
  new StringArgumentDefinition(argname, argdesc    , true    , "SELECT *"  , false)
}

Each workflow component features a method called ArgumentDefinition<?>[] getArgumentDefinitions() which can be used to return a set of definitions like the one above. If no arguments are required, simply let it return an empty array like this:

return new ArgumentDefinition<?>[0];
Accessing arguments

To access the arguments that were specified by the user, each workflow component possesses a method getArguments(), which returns an object of the type net.sourceforge.ondex.AbstractArguments.

This object can be queried for the argument's values. Unique arguments (multipleInstancesAllowed = false) can be accessed using the method Object getUniqueValue(String name). Lists of values for non-unique arguments are returned by the method List<Object> getObjectValueList(String name). The parameter 'name' refers to the name that was given to the corresponding Argument Definition. Depending on the type of workflow component, the argument access field can have different names. These names are introduced in the respective sections below.

Example:

String query = (String) getArguments().getUniqueValue("Query");

Accessing the Ondex graph

The Ondex graph that the workflow component is working on is referenced as a field called ONDEXGraph graph. To find out how to access and manipulate the Ondex graph, please refer to the Ondex JavaDoc and the Ondex graph API manual.

Important: To ensure that you use the Ondex graph in a way that is consistent with other Ondex applications, please ensure that you follow the Ondex semantics guidelines.

Lookup functions (validators)

Requesting lookup functions

Lookup functions (called validators in Ondex) are very similar to workflow components. You can indicate your need of assistance from any known validators to the workflow engine by stating their identifiers. This is done by implementing the method String[] requiresValidators(), where the return value is an array of the IDs of the required validators. Every workflow component features this method. If you do not require any lookup functions, simply return an empty String array:

return new String[0];
Calling lookup functions

All Ondex validators are accessible through the static field net.sourceforge.ondex.config.ValidatorRegistry.validators, which is of the type HashMap<String,ONDEXValidator>. The workflow engine provides all validators requested in the requiresValidators() method (see above) in this hash map. The validator's ID is used as the hash key:

Validator taxLookup = ValidatorRegistry.validators.get("taxonomy");

To use the validator simply call its validate(Object o) method. It will return the converted Object.

String ncbiTaxID = (String) taxLookup.validate("yeast");

Indexing environment

The Ondex workflow engine can provide a Lucene based indexing environment for fast searches if needed.

Requesting search environment initialization

You can indicate that your component makes use of the search environment feature. Every workflow component possesses a method boolean requiresIndexedGraph(). Simply set its return value to true if you want to activate the search index.

Accessing the search environment

If you have instructed the workflow engine to create a search index over the current graph by returning true for requiresIndexedGraph() an Apache Lucene Environment will be provided for you. To access it use the static field net.sourceforge.ondex.config.LuceneRegistry.sid2luceneEnv. It is a HashMap<Long,LuceneEnv> that uses the graph's super ID (SID) as key. Thus you access it as follows:

LuceneEnv env = LuceneRegistry.sid2luceneEnv.get(graph.getSID());

This field provides the following methods to search inside the graph:

To learn more about how to use this feature refer to the Ondex Javadoc and the Apache Lucene documentation.

Writing a parser

Creation

Create a new Java class inside your package that extends the class net.sourceforge.ondex.parser.ONDEXParser and implement its abstract methods.

Your class should now look like this:

package net.sourceforge.ondex.parser.myparser;

import net.sourceforge.ondex.args.ArgumentDefinition;
import net.sourceforge.ondex.parser.ONDEXParser;

public class Parser extends ONDEXParser {
 
	@Override
	public ArgumentDefinition<?>[] getArgumentDefinitions() {
		return null;
	}
 
	@Override
	public String getId() {
		return null;
	}
	
	@Override
	public String getName() {
		return null;
	}
 
	@Override
	public String getVersion() {
		return null;
	}
 
	@Override
	public String[] requiresValidators() {
		return null;
	}
 
	@Override
	public void start() throws Exception {

	}
 
}

Input specification

You will want to specify whether your parser will read a single file, a directory or both. Simply set the return values for the methods boolean readsDirectory() and boolean readsFile() accordingly.

Implementation

In addition to the general argument values, the getArguments() method also allows access to the input file and/or input directory. This is done using the method getArguments().getInputDir() and getArguments().getInputFile(), respectively.

Writing a Mapping

Creation

Create a new Java class inside your package that extends the class net.sourceforge.ondex.mapping.ONDEXMapping and override its abstract methods.

Your class should now look like this:


package net.sourceforge.ondex.mapping.mymapping;

import net.sourceforge.ondex.args.ArgumentDefinition;
import net.sourceforge.ondex.mapping.ONDEXMapping;

public class Mapping extends ONDEXMapping {

	@Override
	public ArgumentDefinition<?>[] getArgumentDefinitions() {
		return null;
	}

	@Override
	public String getName() {
		return null;
	}

	@Override
	public String getId() {
		return null;
	}

	@Override
	public String getVersion() {
		return null;
	}

	@Override
	public boolean requiresIndexedGraph() {
		return false;
	}

	@Override
	public String[] requiresValidators() {
		return null;
	}

	@Override
	public void start() throws Exception {

	}
}

Writing a filter

Creation

Create a new Java class inside your package that extends the class net.sourceforge.ondex.mapping.ONDEXFilter and implement its abstract methods.

Your class should now look like this:

package net.sourceforge.ondex.filter.myfilter;

import net.sourceforge.ondex.args.ArgumentDefinition;
import net.sourceforge.ondex.core.ONDEXConcept;
import net.sourceforge.ondex.core.ONDEXGraph;
import net.sourceforge.ondex.core.ONDEXRelation;
import net.sourceforge.ondex.core.ONDEXView;
import net.sourceforge.ondex.filter.ONDEXFilter;

public class Filter extends ONDEXFilter {

	@Override
	public void copyResultsToNewGraph(ONDEXGraph exportGraph) {

	}

	@Override
	public String getId() {
		return null;
	}

	@Override
	public Set<ONDEXConcept> getVisibleConcepts() {
		return null;
	}

	@Override
	public Set<ONDEXRelation> getVisibleRelations() {
		return null;
	}

	@Override
	public ArgumentDefinition<?>[] getArgumentDefinitions() {
		return null;
	}

	@Override
	public String getName() {
		return null;
	}

	@Override
	public String getVersion() {
		return null;
	}

	@Override
	public boolean requiresIndexedGraph() {
		return false;
	}

	@Override
	public String[] requiresValidators() {
		return null;
	}

	@Override
	public void start() throws Exception {

	}

}

Output

Filters do not manipulate the graph directly, but rather provide a view on the data. This is done by providing Sets of visible concepts and relations using three special methods:

It is advisable to keep ''ONDEXBitSets'' of the filtered concepts and relations as private fields in your Filter class. This makes implementing the above methods much easier.

Writing a transformer

Creation

Create a new Java class inside your package that extends the class net.sourceforge.ondex.transformer.ONDEXTransformer and override its abstract methods.

Your class should now look like this:

package net.sourceforge.ondex.transformer.mymapping;

import net.sourceforge.ondex.args.ArgumentDefinition;
import net.sourceforge.ondex.transformer.ONDEXTransformer;

public class Transformer extends ONDEXTransformer {

	@Override
	public ArgumentDefinition<?>[] getArgumentDefinitions() {
		return null;
	}

	@Override
	public String getName() {
		return null;
	}

	@Override
	public String getVersion() {
		return null;
	}

	@Override
	public boolean requiresIndexedGraph() {
		return false;
	}

	@Override
	public String[] requiresValidators() {
		return null;
	}

	@Override
	public void start() throws Exception {

	}
}

Writing a statistics method

Creation

Create a new Java class inside your package that extends the class net.sourceforge.ondex.statistics.ONDEXStatistics and override its abstract methods.

Your class should now look like this:

package net.sourceforge.ondex.statistics.mymapping;

import net.sourceforge.ondex.args.ArgumentDefinition;
import net.sourceforge.ondex.statistics.ONDEXStatistics;

public class Statistics extends ONDEXStatistics {

	@Override
	public ArgumentDefinition<?>[] getArgumentDefinitions() {
		return null;
	}

	@Override
	public String getName() {
		return null;
	}

	@Override
	public String getVersion() {
		return null;
	}

	@Override
	public boolean requiresIndexedGraph() {
		return false;
	}

	@Override
	public String[] requiresValidators() {
		return null;
	}

	@Override
	public void start() throws Exception {

	}
}

Compiling and running the new module

In order to build the new workflow module proceed as follows:

  1. Go back to your main project directory (that contains your pom.xml file).
  2. Build the project using the command:
    mvn install
  3. Once your project has been built successfully you will find the Ondex plugin jar file in your project's target/ directory under the name <artifactID>-<version>-jar-with-dependencies.jar (where <artifactID> and <version> are the ID and version you stated in your pom.xml file.
  4. Copy the jar file into the plugins/ directory of your local Ondex installation. You can now run a workflow using your new components.