Ondex graph API manual

For Ondex version 0.1.3

Introduction

This manual was written to help developers acquaint themselves with the Ondex graph API.

The background section provides a description of the idea behind the Ondex graph data structure and gives a short overview of the different available graph implementations.

The main section discusses the details of the API's structure and how to use it. First the general aspects of handling graph elements are discussed. Subsequently, the role of the metadata behind graphs is explained. Finally, some advanced features like trinary relations, the permissions system and logging will be discussed.

Background

The Ondex graph data structure

The graph, concepts and relations

Ondex employs an enriched graph data structure to store any kind of integrated information. The data structure primarily consists of graph nodes, called 'concepts' and edges that connect them, called 'relations'. Each concept and each relation is bound to be of a certain type, called 'concept class' for concepts and 'relation type' for relation. For example, imagine a concept of type 'Gene' being connected via a relation of type 'encodes' to another concept of type 'Protein'.

Provenance and evidence

Ondex features controlled sets of terms for provenance and evidence information, called 'CVs' and 'evidence types'.

'CVs' describe datasources from which data was imported. Every concept must be associated with a 'CV' reference. For example, the 'Protein' concept could have been imported from UniprotKB, which would be indicated by assigning the 'UniprotKB' CV element to the concept.

'Evidence types' on the other hand describe the evidence that supports the information represented by a concept or relation. Thus, all concepts and relations carry at least one evidence type reference. For example, if a relation has been determined using an accession mapping method, it would have the 'accession_based_mapping' evidence type attached.

Attributes

Concepts and relations can carry further attributes. The most important attributes are the GDS (General Data Storage) elements, which allow the attachment of any kind of Java object to concepts or relations. GDS elements are identified by 'attribute names' which are again part of a controlled set. These attribute names not only provide a context for their respective GDS elements, but also determine their type (i.e. Java class). For example a relation 'has_similar_sequence' between two protein concepts might carry the GDS with a value of 0.00001 under the attribute name 'e-value' which determines it to be of type java.lang.Double.

Other than GDS elements, there are also certain special attributes for Ondex concepts. Cross references to external databases are stored in so-called 'concept accession' objects. These cross references may point to the database from which the concept was originally imported or any other. Concept accessions consist of two parts: a CV object specifying the database and the actual entry identifier. Our example protein concept from UniProtKB would carry a concept accession pointing to a UniProtKB entry.

Another special attribute is the 'concept name'. A concept can have more than one 'concept name', allowing synonyms to be added. A boolean flag can be used to make a 'concept name' a preferred name. This is mainly used for visualisation purposes. The example protein concept could have the preferred name 'Cdc13p' and the synonym 'YDL220C'.

Implementations

The Ondex API is merely a collection of Java interfaces that define the data structure and its access methods. The way in which the data is physically represented and stored is not determined by the API, but by its implementation. There are several different implementations of the Ondex graph API available. Most notably:

New, experimental implementations are currently under development, including:

Using the Ondex graph API

The Ondex graph API comes as the maven artifact net.sourceforge.ondex.core:api. The latest snapshot can be downloaded from the Rothamsted Research Nexus server. The main interface classes are found in the package net.sourceforge.ondex.core. Hence, if your IDE does not handle import statements automatically, you can import the complete contents of the package directly:

import net.sourceforge.ondex.core.*;

There are additional documents available that capture the technical details of the Ondex graph API:

Ondex API UML diagram

Ondex API UML class diagram

Graphs, concepts and relations

In the Ondex API the graph is represented by the class ONDEXGraph. Every Ondex graph is identified by a unique long integer ID called 'SID'. It can be queried from the graph and also from all of its elements so there can be no confusion as to which graph a given concept or relation belongs. The SID is retrieved using the method getSID(), defined in the interface 'ONDEXAssociable' for this purpose only (See class diagram).

Concepts and relations are modeled by the two classes ONDEXConcept and ONDEXRelation. They can be accessed from the ONDEXGraph object using several querying methods. Like the graph itself, they also feature unique IDs which can be queried using the getId() method. Unlike the SID, these IDs are plain integers. Concepts and relations inherit this and other methods from their common super-interface 'ONDEXEntity'.

Example code snippet:

/**
 * A small example method that would check if the SID of a concept
 * is consistent with the graph that owns it.
 */
public void assertSidConsistencyForConcept(int conceptID) {
  long graphSID = graph.getSID();
  ONDEXConcept c = graph.getConcept(conceptID);
  assert(c.getSID() == graphSID);
}

Network browsing: Relations are directed connections between two concepts, which can be accessed with the two methods 'getFromConcept()' and 'getToConcept()'. Furthermore, to find out what relations are adjacent to a given concept, you can use the graph's 'getRelationsOfConcept()' method. These methods are the elementary building blocks for any implementations of network traversal algorithms in Ondex.

Types of concepts and relations: As mentioned above, both concepts and relations need to have hierarchical types. The getOfType() method, which they inherit from their common interface Instantiation<H extends Hierarchy>, can be used to access them. When called on a concept the method returns a ConceptClass object; and when called on a relation, it returns a RelationType object. The Instantitiation<H extends Hierarchy> interface also provides them with a method called inheritedFrom(). This method can be used to determine whether a concept or relation is of a given type. More about concept classes, relation types, hierarchies and their instantiations can be found in the metadata section.

Provenance and evidence: The source from which a concept has been obtained is represented by the class 'CV'. Calling the getElementOf() method on a concept will return an object of that type. To access a concept's or relation's evidence, the getEvidence() can be called, which will return a Set<EvidenceType> object. More about 'EvidenceType' and 'CV' can be found in the metadata section.

Example code snippet:

/**
 * Example method that prints out information about the immediate
 * neighbourhood of a concept.
 */
public void printNeighbourhoodOfConcept(int conceptID) {

  ONDEXConcept concept = graph.getConcept(conceptID);

  System.out.println("Concept #" + conceptID +" of type " + concept.getOfType().getFullname() +
                     " from " + concept.getElementOf().getFullname() + " has the following connections:");

  for (ONDEXRelation relation : graph.getRelationsOfConcept(concept)) {

    ONDEXConcept neighbour;

    //Find out whether it is an incoming or outgoing relation.
    if (relation.getFromConcept.equals(concept)) {
      //Outgoing relation

      neighbour = relation.getToConcept();

      System.out.println(relation.getOfType().getFullname() + " concept #" + neighbour.getId() +
                         "of type " + neighbour.getOfType().getFullname() + " from source " +
                         neighbour.getElementOf().getFullname());

    } else {
      //Incoming relation

      neighbour = relation.getFromConcept();

      System.out.println(relation.getOfType().getInverseName() + " concept #" + neighbour.getId() +
                         "of type " + neighbour.getOfType().getFullname() + " from source " +
                         neighbour.getElementOf().getFullname());

    }
  }
}

Attributes: The GDS elements, which serve as general attributes to both concepts and relations are modeled by the interface 'GDS'. Like concepts and relations, GDSs have hierarchical types, called 'attribute names'. This functionality is achieved through implementing the same interace as concepts and relations: Instantiation<H extends Hierarchy>. Thus, the attribute name of a GDS is also accessed using the method 'getOfType()'. The value of a GDS, on the other hand, is returned by the method 'getValue()'. Remember that the Java type of the value is determined by the attribute name. More about the function of 'AttributeName' can be found in the metadata section. To access the GDS elements on a concept or relation simply use its 'getGDS()' method.

The two specialized attributes 'ConceptAccession' and 'ConceptName' can be accessed by calling a concept's 'getConceptAccession()' and 'getConceptName()' methods, respectively. As a reminder: concept accessions are used as cross-references to external databases. The actual cross reference value is returned by the 'getAccession()' method. Its namespace, represented by a CV object, can be accessed using the 'getElementOf()' method. Concept names, on the other hand, are queried through the concept's 'getConceptName()' method. Other than the 'getName()' method for accessing the actual string it features a method called 'isPreferred()' which returns whether the name is the concept's preferred name or not.

Example code snippet:

/**
 * Example method that searches for matching accessions of a given namespace
 * between the two concepts with id cid1 and cid2
 */
public ConceptAccession findMatchingAccessionOfCV(int cid1, int cid2, CV namespace) {

  //prepare set to store accessions of the given namespace for concept 1
  Set<ConceptAccession> suitableAccessions = new HashSet<ConceptAccession>();

  ONDEXConcept concept1 = graph.getConcept(cid1);

  //fill the prepared set accordingly
  for (ConceptAccession acc : concept1.getConceptAccessions()) {
    if (acc.getElementOf().equals(namespace)) {
      suitableAccessions.add(acc);
    }
  }

  ONDEXConcept concept2 = graph.getConcept(cid2);

  //check if the set contains any of the accessions of concept 2
  for (ConceptAccession acc : concept1.getConceptAccessions()) {
    if (suitableAccessions.contains(acc)) {
      //if found, return it
      return acc;
    }
  }

  return null;
}

Creating concepts and relations: Concepts and relations can be created on a graph using the 'EntityFactory' class, which is accessible over the graph's 'getFactory()' method. It provides a great number of different versions of 'createConcept()' and 'createRelation()' methods with different parameter sets to suit any given situation.

Example code snippet:

//fetch metadata from graph.
CV cvUnknown = graph.getMetaData.getCV("unknown");
ConceptClass ccThing = graph.getMetaData.getConceptClass("Thing");
EvidenceType etIMPD = graph.getMetaData.getEvidenceType("IMPD");
RelationType rt = graph.getMetaData.getRelationType("r");

//create two "Thing" concepts
ONDEXConcept c1 = graph.getFactory().createConcept("test1", cvUnknown, ccThing, etIMPD);
ONDEXConcept c2 = graph.getFactory().createConcept("test2", cvUnknown, ccThing, etIMPD);

//connect them with a "related" relation
ONDEXRelation r1 = graph.getFactory().createRelation(c1, c2, r, etIMPD);

Metadata

Concept classes, relation types, CVs, evidence types and attribute names are the so called metadata of the graph. They all inherit the super-interface 'MetaData' with its three methods 'getId()', 'getFullname()' and 'getDescription()', all of which return strings. Additionally, concept classes, relation types and attribute names are able to form specialization hierarchies. This is modelled through their common interface Hierarchy. The interface provides them with a number of appropriate methods: the 'getSpecializationOf()' method returns the type's respective parent type, while the methods isAssignableFrom() and isAssignableTo() can be used to determine whether another type is a super- or sub-type of the given type.

All metadata is accessible using the graph's getMetaData() method, which returns an object of type 'ONDEXGraphMetaData'. ONDEXGraphMetaData features a great number of different methods to create, fetch and delete parts of any of the different kinds of metadata.

The Ondex workflow engine initializes its graphs using a standard metadata set which is parsed from a file in the data/xml/ subdirectory of your local Ondex installation. The current version of that file can be found in the SVN repository. You can edit it conveniently using the Ondex metadata editor, which can be downloaded from the Nexus server.

Desciption of advanced metadata elements

Creating new Ondex graphs

In the likely case that you are developing an Ondex workflow plug-in, you will not need to create the graph object yourself, as it is provided to each plug-in by the workflow engine. You can access it over a protected field, that every plug-in possesses:

protected ONDEXGraph graph;

Otherwise, if you want to develop an independent software tool that uses an Ondex graph, you will need to create a graph object. As 'ONDEXGraph' is just an interface, like the rest of the API, you will need to employ an implementation of the API to do so. The different graph implementations are packaged in the following maven artifacts:

See the implementations chapter for more information.

Whichever graph implementation you decide on, you need to note the following: Before you instantiate a graph, you should make sure that the JVM system properties contains the entry ondex.dir, pointing to your Ondex data directory. The system requires it to read the log4j.config and config.xml files in order to set up and configure the logging and security subsystems. This property can be set anywhere in your code as long as it is executed before the graph is created.

To enable logging on your graph, create a new 'net.souceforge.ondex.logging.ONDEXCoreLogger' and add it to your graph. Most implementation already accept the logger as a constructor argument.

Memory implementation

In most cases the 'memory' implementation of the graph is the best choice. It can be can be found in the artifact net.sourceforge.ondex:memory (available from the Rothamsted Research Nexus server). You will find the 'MemoryGraph' implementing classes in the package net.sourceforge.ondex.core.memory.

Example code snippet:

//set the data directory location
System.setProperty("ondex.dir","/home/foobar/ondex/data");

//create a new logger
ONDEXCoreLogger logger = new ONDEXCoreLogger();

//create the graph
ONDEXGraph graph = new MemoryGraph("My graph", logger);

Persistent implementation

If you choose to employ the BerkeleyDB persistent implementation instead, you will require the net.sourceforge.ondex:berkeley artifact (also available from the Rothamsted Research Nexus server). You will find the required class 'BerkeleyEnv' in the package net.sourceforge.ondex.core.persistent.

In order to create a persistent graph, you first have to create a BerkeleyEnv object, which serves as a factory for the graph. You can access it using the 'getAbstractONDEXGraph()' method. Another important aspect of the persistent implementation is that it requires to be shut down safely before the JVM is terminated. This is done by calling the 'cleanup()' method on the berkeley environment object.

Example code snippet:

//set the data directory location
System.setProperty("ondex.dir","/home/foobar/ondex/data");

//create a new logger
ONDEXCoreLogger logger = new ONDEXCoreLogger();

//set the location where the database should be created
String dbLocation = "/home/foobar/berkeley/";

//create a new berkeley environment
BerkeleyEnv env = new BerkeleyEnv(dbLocation, "My graph", logger);

//extract the graph from the berkeley environment
ONDEXGraph graph = env.getAbstractONDEXGraph();

//do something with your graph here

//before your application exits you should cleanup the database.
env.cleanup();

Advanced aspects

Trinary relations

Apart from standard binary relations that connect one concept with another, Ondex is also capable of modelling so-called 'trinary relatinons'. In addition to the 'from' and 'to' concepts they reference a 'qualifier' concept. It can be queried from the relation using its 'getQualifier()' method. If none exists (i.e. the relation is binary) the method will simply return 'null'.

Example code snippet:

/**
 * Example method returning whether a relation is
 * trinary or not
 */
public boolean isRelationTrinary(int relationID) {

  ONDEXRelation relation = graph.getRelation(relationID);

  return relation.getQualifier() != null;

}

Contexts

Another advanced aspect of the Ondex graph API is the notion of 'contexts'. Any concept or relation can be part of one or more contexts. For example, given a graph of a metabolic network, all enzymes, metabolites and their connecting relations that form the phenylalanine synthesis pathway could be declared to be part of the context of the phenylalanine synthesis pathway. To make concepts and relations part of a context they are simply tagged with the context's identifier. A special gimmick about the realisation of this feature is that a context identifier itself is an Ondex concept. So in the above example a context of type 'Pathway' named 'phenylalanine synthesis' would serve as the context identifier. This allows information about the context itself to be stored in Ondex.

ONDEXConcept and ONDEXRelation both have the following three methods for maintaining contexts:

In addition, the ONDEXGraph can be queried for the contents of any context using its methods 'Set<ONDEXConcept> getConceptsOfContext(ONDEXConcept)' and 'Set<ONDEXRelation> getRelationsOfContext(ONDEXConcept)'.

Example code snippet:

/**
 * This example method gets the contents of both context1 and context2 and adds the intersection
 * of both sets to context3
 */
public markContextIntersection(ONDEXConcept context1, ONDEXConcept context2, ONDEXConcept context3) {
  //get all concepts that belong to contexts 1 and 2
  Set<ONDEXConcept> conceptsOfContext1 = graph.getConceptsOfContext(context1);
  Set<ONDEXConcept> conceptsOfContext2 = graph.getConceptsOfContext(context2);

  //intersect the two sets
  Set<ONDEXConcept> conceptsOfContext3 = conceptsOfContext1;
  conceptsOfContext3.retainAll(conceptsOfContext2);


  //get all relations that belong to contexts 1 and 2
  Set<ONDEXRelation> relationsOfContext1 = graph.getRelationsOfContext(context1);
  Set<ONDEXRelation> relationsOfContext2 = graph.getRelationsOfContext(context2);

  //intersect them too
  Set<ONDEXRelation> relationsOfContext3 = relationsOfContext1;
  relationsOfContext3.retainAll(conceptsOfContext2);


  //iterate over the concepts that are the intersection of 1 and 2
  for (ONDEXConcept concept : conceptsOfContext3) {

    //and mark them as part of context 3
    concept.addContext(context3);

  }

  //iterate over the relations that are the intersection of 1 and 2
  while (ONDEXRelation relation : relationsOfContext3) {
  
    //and mark them, too, as part of context 3
    relation.addContext(context3);

  }
}

Access permissions

Users, sessions and threads

Ondex uses an access permissions system very similar to the one found on Unix file systems. Users and groups can be created and can have different access levels on Ondex graphs and their contents. Every JVM thread that runs on the Ondex system belongs to a certain user. This is realized by assigning a session object holding the the user's credentials to the respective thread. This way different threads belonging to different users can run on the same JVM, accessing the same set of graphs according to their respective permissions.

To find out what user is logged into the current thread you can use the static method 'Session.getSessionForThread()'. This will return a 'Session' object, which can be used to find out the current user id and group id. The methods getUID() and getGID() will do that.

To login a different user onto the current thread you will of course require the user name and password. Use the static method 'Session.getSession(String uname, String pwd)' to retrieve the corresponding Session, then associate it with the current thread using 'Session.setSessionForThread(Session s)'.

Permission levels

Permissions on Ondex graphs are handled on a per-element basis. That is, every concept, every relation, every metadata element and the graph itself can determine independently from one another, exactly what user has what permission on them. This strongly depends on the owner of the element. Each element has a specific owner (i.e. an associated user id) and a specific group (i.e. an associated group id, usually the group in which the owner participates). The owner can define three different access scopes on each of his elements, modeled by the Enum 'net.sourceforge.ondex.core.security.perm.AccessScope':

There are five different permission levels, modeled by the Enum 'net.sourceforge.ondex.core.security.perm.Permission':

  1. NONE: The user has no access to the element
  2. GET: The user can retrieve information from the element
  3. CREATE: The user can retrieve and add information from the element
  4. UPDATE: The user can retrieve and add elements and associated attributes and can also edit those associated attributes
  5. DELETE: The user can retrieve, add, modify or delete elements and associated attributes.

Whenever a method on a graph, a concept or a relation is called, Ondex checks whether the user associated with the executing thread has the permission to do so. For example, a thread associated with user 'foo' calls the method 'getConceptAccessions()' on the concept 'bar'. As this method requires permission level 'GET', Ondex will check whether 'foo' has the 'GET' permission on 'bar'. If he does, the method will perform as usual; if not, the method will throw an 'AccessDeniedException'.

Permission maintenance

To query or alter the settings for a certain graph element the class 'net.sourceforge.ondex.security.perm.GlobalPermissions' can be used. To access the GlobalPermsissions object for the graph you are working on use the method 'getInstance(long forSID)' with the graph's SID:

GlobalPermissions gp = GlobalPermissions.getInstance(graph.getSID());

The object itself features a great number of methods for accessing and altering permissions on any element of the concerned graph. However, only the owner of the respective element is allowed to change its permission settings. To save space on this tutorial, only the methods operating on concepts are demonstrated here. The remaining set of methods works very similarly and is documented in the JavaDocs.

Example code snippet:

/**
 * An example method that iterates over all concepts owned
 * by the current user and sets the permissions for everyone
 * else to level NONE.
 */
public void restrictAccessToMyConcepts(ONDEXGraph graph) {

  //get permissions object
  GlobalPermissions gp = GlobalPermissions.getInstance(graph.getSID());

  //retrive own user id
  int myUserID = Session.getSessionForThread().getUID();

  //iterate over all concepts
  while(ONDEXConcept concept : graph.getConcepts()) {

    //if current concept is owned by current user remove other permissions
    if (gp.getConceptOwnerUserID(concept.getId()) == myUserID) {
      gp.setConceptPermission(concept.getId(), AccessScope.GROUP, Permission.NONE);
      gp.setConceptPermission(concept.getId(), AccessScope.OTHER, Permission.NONE);
    }

  }

}

Exceptions and errors

The Ondex API comes with a hierarchy of exception types which are used throughout the Ondex codebase. For convenience reasons, the types most frequently used in the graph are runtime exceptions, thus do not need to be handled explicitly. All of these exception types can be found in the package 'net.sourceforge.ondex.exception.type'

Ondex exception class hierarchy

Ondex exception class hierarchy

On the one hand, there are exceptions that can be thrown by the graph API or its implementations. The most important of which is the 'AccessDeniedException', which is thrown whenever method is called that tries to access a graph or graph element on which the current user has no access. Furthermore, the 'WrongParameterException' reports that a parameter given to one of the API methods was invalid. Its three subclasses 'NullValueException', 'EmptyStringException' and 'ObjectTypeMismatchException' further specify in what way the given parameter was invalid. Finally, the 'DataLossException' indicates that a severe error occurred in the graph API's implementation. This may report disk failures, database errors or similarly unpleasant events.

On the other hand, there are pre-defined exception types that you can use in your own code to notify higher levels of the system architecture of problems in communicating with the API. The contained classes are especially useful for the development of new workflow plug-ins. All of them are subtypes of the class 'PluginException' so they can easily be caught by higher level execution layers such as the Ondex workflow engine.

Events and logging

...