Ondex workflow module development manual
For Ondex version 0.1.3
Prerequisites and requirements
It is assumed that the reader of this document possesses appropriate prior knowledge about writing Java 1.6 programs, as well as the usage of the tools Subversion and Maven2.
To follow the steps described in this manual it is required to have an installation of Sun JDK1.6, the latest version of Subversion and Maven2 ready on a computer.
Preparations
The following steps are only required until the Ondex workflow engine is deployed to the Maven repository at Rothamsted Research.
- Check out the Ondex workflow engine from sourceforge.net using the command:
svn co https://ondex.svn.sourceforge.net/svnroot/ondex/trunk/ondex_parent ondex_parent - Change into the resulting directory ondex_parent and compile the contents using the command:
mvn install
Your local maven repository now contains all required Ondex libraries.
Creating a new workflow module
- Create a new maven project, by creating a new folder containing a
pomfile. Specify the following artifact as the pom's parent:- Group ID:
net.sourceforge.ondex - Artifact ID:
modules - Version:
0.1.3-SNAPSHOT
- Group ID:
- Add the following snapshot repository reference to your maven installation's
settings.xml:http://ondex.rothamsted.bbsrc.ac.uk/nexus/content/groups/public-snapshots
- Create the standard maven java direcory in your new folder. (
src/main/java) - Create a new Java package inside
src/main/java
Dealing with Maven bugs
There are currently a few maven bugs that turn the creation of new workflow components into quite a hassle. To save you from unnecessary frustration we will list a few hacks here that help bypass these troubles.
Artefact version bug
Maven currently suffers from a bug that does not allow an artifact to have a version number that differs from its parent's version number. Thus your artifact must match Ondex's latest version number: 0.1.3-SNAPSHOT.
Parent dependency resolution bug
Another bug in maven currently prevents it from correctly resolving the download URL of parent dependencies. This means that your builds will fail unless you follow the instructions below:
- Follow this link in your browser. You will see a listing of all snapshot builds of the Ondex
modulesartifact. The file names will contain identifiers for the different snapshot builds, consisting of date, time and number of the build (For example, 20101019.002606-323 is for build number 323 from the 19/10/2010, 00:26:06). - Scroll to the bottom of the page to find the latest build and copy its build identifier.
- In your
pomfile go to the version field of the parent declaration and replace the String "SNAPSHOT" with the build ID you just copied from the browser. - Build your artifact using the command
It should now succeed in downloading the required dependencies.mvn install - After your build has succeeded, you can change the parent version field back to what it was before. It should now build just fine.
Using external libraries
As a general rule one can make use of any external library by defining it as a dependency in the pom file. During compile time all dependencies will be automatically downloaded and merged into the resulting plug-in jar file.
If a required library is already used by the Ondex workflow engine, it is of course not necessary to merge it into the jar file. In this case, simply set its dependency scope to provided. Here is a list of all libraries that are already in use:
- blowfish:blowfishj:2.14
- commons-cli:commons-cli:1.2
- com.sleepycat:je:3.3.75
- com.thoughtworks.xstream:xstream:1.3.1
- fastutil:fastutil:5.1.5
- log4j:log4j:1.2.12
- net.sourceforge.ondex:oxl-module:0.0.1
- net.sourceforge.ondex:tools:0.0.1
- org.apache.lucene:lucene-core:2.4.0
- org.codehaus.staxmate:stax2:2.1
- org.codehaus.woodstox:stax2-api:3.0.1
- org.codehaus.woodstox:woodstox-core-lgpl:4.0.3
- stax:stax-api:1.0.1
- xerces:xercesImpl:2.9.1
- xml-apis:xml-apis:1.3.04
- xpp3:xpp3_min:1.1.4c
Creating a workflow component
There are several different types of workflow components in Ondex:
- Parsers: A parser is a workflow component that creates the contents of an Ondex graph, according to information in a set of files. It usually targets a specific database or file format. The KEGG parser, or the SBML parser are only a few examples.
- Mapping: Mappings create relations between different parts of an Ondex graph. For example, the 'accession-based' mapping usually maps concepts from different data sources to one another. The 'blast-based' mapping creates relations between sequence-bearing concepts according to their similarity.
- Filters: A filter extracts a subset from the graph. For example the all-pairs shortest path filter yields all concepts and the subset of relations that are part of shortest paths between any two concepts.
- Transformers: A transformer is a workflow component that transforms the graph from one configuration into another. For example: The relation collaps transformer merges concepts that are connected by a certain relation.
- Statistics: This special kind of workflow component runs statistical analyses on the graph and outputs them to the file system. For example: The GOA quality statistic outputs a specificity analysis over GeneOntology annotations.
For more information about the different workflow components, please see the workflow component documentation.
Common technical aspects
All workflow components have some important common technical aspects.
ID, name and version information
Each workflow component features three methods called String getId(), String getName() and String getVersion().
The getId() method is used to define the ID of your component, which will be used to invoke and execute it.
The getName() method allows you to state a full name for your component.
For example: 'PSI Molecular Interaction Format Parser' would be the name for the Parser with ID psimi.
In the getVersion() method you can specify a version tag, usually the date of the last update.
Arguments
All workflow components can require arguments. It is the choice of the developer whether or not and if so, how many arguments it requires. Argument requirements are defined using objects of the type 'ArgumentDefinition', which will be explained in the next section.
Argument Definitions
Ondex's workflow API provides a set of many different argument definition types, all of which can be found in the the package net.sourceforge.ondex.args in the artifact net.sourceforge.ondex:workflow-api.
- String Argument Definition
- Integer Range Argument Definition
- Float Range Argument Definition
- Boolean Argument Definition
- Sequence-type Argument Definition
- File Argument Definition
- URL Argument Definition
If an argument you require does not suit any of these Argument Definitions, it is possible to write a new one. Create the package net.sourceforge.ondex.args in your project and create a new class that implements the interface net.sourceforge.ondex.args.ArgumentDefinition.
Specifying Argument Definitions
Each of these argument definitions can be configured with certain properties:
- The argument's name
- The argument's description
- Its default value
- Whether it is mandatory
- Whether multiple instances of the argument are allowed
Example:
String argname = "Query";
String argdesc = "The query string";
ArgumentDefinition<?>[] args = new ArgumentDefinition<?>[]{
// name , description, required, defaultValue, multipleInstancesAllowed
new StringArgumentDefinition(argname, argdesc , true , "SELECT *" , false)
}
Each workflow component features a method called ArgumentDefinition<?>[] getArgumentDefinitions() which can be used to return a set of definitions like the one above. If no arguments are required, simply let it return an empty array like this:
return new ArgumentDefinition<?>[0];
Accessing arguments
To access the arguments that were specified by the user, each workflow component possesses a method getArguments(), which returns an object of the type net.sourceforge.ondex.AbstractArguments.
This object can be queried for the argument's values. Unique arguments (multipleInstancesAllowed = false) can be accessed using the method Object getUniqueValue(String name). Lists of values for non-unique arguments are returned by the method List<Object> getObjectValueList(String name). The parameter 'name' refers to the name that was given to the corresponding Argument Definition. Depending on the type of workflow component, the argument access field can have different names. These names are introduced in the respective sections below.
Example:
String query = (String) getArguments().getUniqueValue("Query");
Accessing the Ondex graph
The Ondex graph that the workflow component is working on is referenced as a field called ONDEXGraph graph. To find out how to access and manipulate the Ondex graph, please refer to the Ondex JavaDoc and the Ondex graph API manual.
Important: To ensure that you use the Ondex graph in a way that is consistent with other Ondex applications, please ensure that you follow the Ondex semantics guidelines.
Lookup functions (validators)
Requesting lookup functions
Lookup functions (called validators in Ondex) are very similar to workflow components. You can indicate your need of assistance from any known validators to the workflow engine by stating their identifiers. This is done by implementing the method String[] requiresValidators(), where the return value is an array of the IDs of the required validators. Every workflow component features this method. If you do not require any lookup functions, simply return an empty String array:
return new String[0];
Calling lookup functions
All Ondex validators are accessible through the static field net.sourceforge.ondex.config.ValidatorRegistry.validators, which is of the type HashMap<String,ONDEXValidator>. The workflow engine provides all validators requested in the requiresValidators() method (see above) in this hash map. The validator's ID is used as the hash key:
Validator taxLookup = ValidatorRegistry.validators.get("taxonomy");
To use the validator simply call its validate(Object o) method. It will return the converted Object.
String ncbiTaxID = (String) taxLookup.validate("yeast");
Indexing environment
The Ondex workflow engine can provide a Lucene based indexing environment for fast searches if needed.
Requesting search environment initialization
You can indicate that your component makes use of the search environment feature. Every workflow component possesses a method boolean requiresIndexedGraph(). Simply set its return value to true if you want to activate the search index.
Accessing the search environment
If you have instructed the workflow engine to create a search index over the current graph by returning true for requiresIndexedGraph() an Apache Lucene Environment will be provided for you. To access it use the static field net.sourceforge.ondex.config.LuceneRegistry.sid2luceneEnv. It is a HashMap<Long,LuceneEnv> that uses the graph's super ID (SID) as key. Thus you access it as follows:
LuceneEnv env = LuceneRegistry.sid2luceneEnv.get(graph.getSID());
This field provides the following methods to search inside the graph:
scoredSearchInConcepts(Query q)scoredSearchInRelations(Query q)
To learn more about how to use this feature refer to the Ondex Javadoc and the Apache Lucene documentation.
Writing a parser
Creation
Create a new Java class inside your package that extends the class net.sourceforge.ondex.parser.ONDEXParser and implement its abstract methods.
Your class should now look like this:
package net.sourceforge.ondex.parser.myparser;
import net.sourceforge.ondex.args.ArgumentDefinition;
import net.sourceforge.ondex.parser.ONDEXParser;
public class Parser extends ONDEXParser {
@Override
public ArgumentDefinition<?>[] getArgumentDefinitions() {
return null;
}
@Override
public String getId() {
return null;
}
@Override
public String getName() {
return null;
}
@Override
public String getVersion() {
return null;
}
@Override
public String[] requiresValidators() {
return null;
}
@Override
public void start() throws Exception {
}
}
Input specification
You will want to specify whether your parser will read a single file, a directory or both. Simply set the return values for the methods boolean readsDirectory() and boolean readsFile() accordingly.
Implementation
In addition to the general argument values, the getArguments() method also allows access to the input file and/or input directory. This is done using the method getArguments().getInputDir() and getArguments().getInputFile(), respectively.
Writing a Mapping
Creation
Create a new Java class inside your package that extends the class net.sourceforge.ondex.mapping.ONDEXMapping and override its abstract methods.
Your class should now look like this:
package net.sourceforge.ondex.mapping.mymapping;
import net.sourceforge.ondex.args.ArgumentDefinition;
import net.sourceforge.ondex.mapping.ONDEXMapping;
public class Mapping extends ONDEXMapping {
@Override
public ArgumentDefinition<?>[] getArgumentDefinitions() {
return null;
}
@Override
public String getName() {
return null;
}
@Override
public String getId() {
return null;
}
@Override
public String getVersion() {
return null;
}
@Override
public boolean requiresIndexedGraph() {
return false;
}
@Override
public String[] requiresValidators() {
return null;
}
@Override
public void start() throws Exception {
}
}
Writing a filter
Creation
Create a new Java class inside your package that extends the class net.sourceforge.ondex.mapping.ONDEXFilter and implement its abstract methods.
Your class should now look like this:
package net.sourceforge.ondex.filter.myfilter;
import net.sourceforge.ondex.args.ArgumentDefinition;
import net.sourceforge.ondex.core.ONDEXConcept;
import net.sourceforge.ondex.core.ONDEXGraph;
import net.sourceforge.ondex.core.ONDEXRelation;
import net.sourceforge.ondex.core.ONDEXView;
import net.sourceforge.ondex.filter.ONDEXFilter;
public class Filter extends ONDEXFilter {
@Override
public void copyResultsToNewGraph(ONDEXGraph exportGraph) {
}
@Override
public String getId() {
return null;
}
@Override
public Set<ONDEXConcept> getVisibleConcepts() {
return null;
}
@Override
public Set<ONDEXRelation> getVisibleRelations() {
return null;
}
@Override
public ArgumentDefinition<?>[] getArgumentDefinitions() {
return null;
}
@Override
public String getName() {
return null;
}
@Override
public String getVersion() {
return null;
}
@Override
public boolean requiresIndexedGraph() {
return false;
}
@Override
public String[] requiresValidators() {
return null;
}
@Override
public void start() throws Exception {
}
}
Output
Filters do not manipulate the graph directly, but rather provide a view on the data. This is done by providing Sets of visible concepts and relations using three special methods:
ONDEXGraph copyResultsToNewGraph(ONDEXGraph)ONDEXView<ONDEXConcept> getVisibleConcepts()ONDEXView<ONDEXRelation> getVisibleRelations()
It is advisable to keep ''ONDEXBitSets'' of the filtered concepts and relations as private fields in your Filter class. This makes implementing the above methods much easier.
Writing a transformer
Creation
Create a new Java class inside your package that extends the class net.sourceforge.ondex.transformer.ONDEXTransformer and override its abstract methods.
Your class should now look like this:
package net.sourceforge.ondex.transformer.mymapping;
import net.sourceforge.ondex.args.ArgumentDefinition;
import net.sourceforge.ondex.transformer.ONDEXTransformer;
public class Transformer extends ONDEXTransformer {
@Override
public ArgumentDefinition<?>[] getArgumentDefinitions() {
return null;
}
@Override
public String getName() {
return null;
}
@Override
public String getVersion() {
return null;
}
@Override
public boolean requiresIndexedGraph() {
return false;
}
@Override
public String[] requiresValidators() {
return null;
}
@Override
public void start() throws Exception {
}
}
Writing a statistics method
Creation
Create a new Java class inside your package that extends the class net.sourceforge.ondex.statistics.ONDEXStatistics and override its abstract methods.
Your class should now look like this:
package net.sourceforge.ondex.statistics.mymapping;
import net.sourceforge.ondex.args.ArgumentDefinition;
import net.sourceforge.ondex.statistics.ONDEXStatistics;
public class Statistics extends ONDEXStatistics {
@Override
public ArgumentDefinition<?>[] getArgumentDefinitions() {
return null;
}
@Override
public String getName() {
return null;
}
@Override
public String getVersion() {
return null;
}
@Override
public boolean requiresIndexedGraph() {
return false;
}
@Override
public String[] requiresValidators() {
return null;
}
@Override
public void start() throws Exception {
}
}
Compiling and running the new module
In order to build the new workflow module proceed as follows:
- Go back to your main project directory (that contains your
pom.xmlfile). - Build the project using the command:
mvn install - Once your project has been built successfully you will find the Ondex plugin jar file in your project's
target/directory under the name<artifactID>-<version>-jar-with-dependencies.jar(where<artifactID>and<version>are the ID and version you stated in yourpom.xmlfile. - Copy the jar file into the
plugins/directory of your local Ondex installation. You can now run a workflow using your new components.