This manual was written to help developers acquaint themselves with the Ondex graph API.
The background section provides a description of the idea behind the Ondex graph data structure and gives a short overview of the different available graph implementations.
The main section discusses the details of the API's structure and how to use it. First the general aspects of handling graph elements are discussed. Subsequently, the role of the metadata behind graphs is explained. Finally, some advanced features like trinary relations, the permissions system and logging will be discussed.
Ondex employs an enriched graph data structure to store any kind of
integrated information. The data structure primarily consists of graph
nodes, called 'concepts' and edges that connect them, called
'relations'. Each concept and each relation is bound to be of a certain
type, called 'concept class' for concepts and 'relation type' for
relation.
For example, imagine a concept of type 'Gene' being connected via a relation of type 'encodes' to another concept of type 'Protein'.
Ondex features controlled sets of terms for provenance and evidence information, called 'CVs' and 'evidence types'.
'CVs' describe datasources from which data was imported. Every concept
must be associated with a 'CV' reference. For example, the 'Protein' concept could have been imported from UniprotKB, which would be indicated by assigning the 'UniprotKB' CV element to the concept.
'Evidence types' on the other hand describe the evidence that supports
the information represented by a concept or relation. Thus, all
concepts and relations carry at least one evidence type reference. For
example, if a relation has been determined using an accession mapping
method, it would have the 'accession_based_mapping' evidence type attached.
Concepts and relations can carry further attributes. The most important
attributes are the GDS (General Data Storage) elements, which allow the
attachment of any kind of Java object to concepts or relations. GDS
elements are identified by 'attribute names' which are again part of a
controlled set. These attribute names not only provide a context for
their respective GDS elements, but also determine their type (i.e. Java
class). For example a relation 'has_similar_sequence' between two protein concepts might carry the GDS with a value of 0.00001 under the attribute name 'e-value' which determines it to be of type java.lang.Double.
Other than GDS elements, there are also certain special attributes for Ondex concepts. Cross references to external databases are stored in so-called 'concept accession' objects. These cross references may point to the database from which the concept was originally imported or any other. Concept accessions consist of two parts: a CV object specifying the database and the actual entry identifier. Our example protein concept from UniProtKB would carry a concept accession pointing to a UniProtKB entry.
Another special attribute is the 'concept name'. A concept can have more than one 'concept name', allowing synonyms to be added. A boolean flag can be used to make a 'concept name' a preferred name. This is mainly used for visualisation purposes. The example protein concept could have the preferred name 'cdc13p' and the synonym 'YDL220C'.
The Ondex API is merely a collection of Java interfaces that define the data structure and its access methods. The way in which the data is physically represented and stored is not determined by the API, but by its implementation. There are several different implementations of the Ondex graph API available. Most notably:
New, experimental implementations are currently under development, including:
The Ondex graph API comes as the maven artifact net.sourceforge.ondex:api. The latest snapshot can be downloaded from the Rothamsted Research Nexus server. The main interface classes are found in the package net.sourceforge.ondex.core.
Hence, if your IDE does not handle import statements automatically, you
can import the complete contents of the package directly:
import net.sourceforge.ondex.core.*;
There are additional documents available that capture the technical details of the Ondex graph API:
In the Ondex API the graph is represented by the class ONDEXGraph.
Every Ondex graph is identified by a unique long integer ID called
'SID'. It can be queried from the graph and also from all of its
elements so there can be no confusion as to which graph a given concept
or relation belongs. The SID is retrieved using the method getSID(), defined in the interface 'ONDEXAssociable', defined for this purpose only (See class diagram).
Concepts and relations are modeled by the two classes ONDEXConcept and ONDEXRelation. They can be accessed from the ONDEXGraph object using several querying methods. Like the graph itself, they also feature unique IDs which can be queried using the getId()
method. Unlike the SID, these IDs are plain integers. Concepts and
relations inherit this and other methods from their common
super-interface 'ONDEXEntity'.
Example code snippet:
/** * A small example method that would check if the SID of a concept * is consistent with the graph that owns it. */ public void assertSidConsistencyForConcept(int conceptID) { long graphSID = graph.getSID(); ONDEXConcept c = graph.getConcept(conceptID); assert(c.getSID() == graphSID); }
Element iterators: As Ondex graphs can grow extremely
large, queries to the system potentially return a huge number of
elements. Thus, Ondex employs a special data container for answering
such queries: the ONDEXIterator<T> is an interface
that can be used much like a regular Java iterator. However, depending
on the graph implementation used, the iterator can work like a pointer
on a database rather than a simple Java collection. Therefore, not all
elements are required to reside in memory at the same time.
OndexIterator features a close() method to
indicate that the iterator is no longer required and can be destroyed.
It is essential that this method is called whenever iteration has
finished in order to reset any potential underlying database pointers.
Ondex iterators can be transformed into regular Java collections using
the toCollection() method. However you should make sure
to use this method only when you are certain that the result of your
query is of relatively small size.
OndexIterator<T> can be used for any data element in Ondex including metadata and attributes. However, there is a subclass of the OndexIterator<T>, called OndexView<T> which is used only for concepts and relations. It has additional functionality, such as set operations. Refer to the Javadoc for more information.
Example code snippet:
/* * Query the graph for all concepts * on a given ONDEXGraph graph */ ONDEXView<ONDEXConcept> concepts = graph.getConcepts(); while (concepts.hasNext()) { ONDEXConcept c = concepts.next(); //do something with c here... } /* * IMPORTANT: Close iterator after use! */ concepts.close(); /* * Query the graph's metadata for all CVs * This time use toCollection() for a shortcut */ for (CV cv : graph.getMetaData().getCVs().toCollection()) { //do something with cv here... }
Network browsing: Relations are directed connections between two concepts, which can be accessed with the two methods 'getFromConcept()' and 'getToConcept()'. Furthermore, to find out what relations are adjacent to a given concept, you can use the graph's 'getRelationsOfConcept()' method. These methods are the elementary building blocks for any implementations of network traversal algorithms in Ondex.
Types of concepts and relations: As mentioned above, both concepts and relations need to have types. The getOfType() method can be used to access them. When called on a concept the method returns a ConceptClass object; and when called on a relation, it returns a RelationType object. More about concept classes and relation types can be found in the metadata section.
Provenance and evidence: The source from which a concept has been obtained is represented by the class 'CV'. Calling the getElementOf() method on a concept will return an object of that type. To access a concept's or relation's evidence the getEvidence() can be called, which will return an ONDEXIterator<EvidenceType> object. More about 'EvidenceType' and 'CV' can be found in the metadata section.
Example code snippet:
/** * Example method that prints out information about the immediate * neighbourhood of a concept. */ public void printNeighbourhoodOfConcept(int conceptID) { ONDEXConcept concept = graph.getConcept(conceptID); System.out.println("Concept #" + conceptID +" of type " + concept.getOfType().getFullname() + " from " + concept.getElementOf().getFullname() + " has the following connections:"); for (ONDEXRelation relation : graph.getRelationsOfConcept(concept).toCollection()) { ONDEXConcept neighbour; //Find out whether it is an incoming or outgoing relation. if (relation.getFromConcept.equals(concept)) { //Outgoing relation neighbour = relation.getToConcept(); System.out.println(relation.getOfType().getFullname() + " concept #" + neighbour.getId() + "of type " + neighbour.getOfType().getFullname() + " from source " + neighbour.getElementOf().getFullname()); } else { //Incoming relation neighbour = relation.getFromConcept(); System.out.println(relation.getOfType().getInverseName() + " concept #" + neighbour.getId() + "of type " + neighbour.getOfType().getFullname() + " from source " + neighbour.getElementOf().getFullname()); } } }
Attributes: The GDS elements, which serve as general attributes to both concepts and relations are modeled by the interface 'GDS<E extends ONDEXEntity>'. As the notation suggests, the interface needs to be parametrized over either ONDEXConcept or ONDEXRelation. The attribute name of a GDS can be accessed using the method 'getAttributeName()'. Its value, on the other hand, is returned by the method 'getValue()'. Remember that the Java type of the value is determined by the attribute name. More about the function of 'AttributeName' can be found in the metadata section. To access the GDS elements on a concept or relation simply use the 'getConceptGDS()' method or 'getRelationGDS()' method, respectively.
[Still to write: explain methods of attribute name in metadata section].
The two specialized attributes 'ConceptAccession' and 'ConceptName' can be accessed by calling a concept's 'getConceptAccession()' and 'getConceptName()'
methods, respectively. As a reminder: concept accessions are used as
cross-references to external databases. Their actual database key is
retured by the 'getAccession()' method. Its namespace, represented by a CV object, can be accessed using the 'getElementOf()' method.
Concept names, on the other hand, are queried over the concept's 'getConceptName()' method. Other than the 'getName()' method for accessing the actual string it features a method called 'isPreferred()' which returns whether the name is the concept's preferred name or not.
Example code snippet:
/** * Example method that searches for matching accessions of a given namespace * between the two concepts with id cid1 and cid2 */ public ConceptAccession findMatchingAccessionOfCV(int cid1, int cid2, CV namespace) { //prepare set to store accessions of the given namespace for concept 1 Set<ConceptAccession> suitableAccessions = new HashSet<ConceptAccession>(); ONDEXConcept concept1 = graph.getConcept(cid1); //fill the prepared set accordingly for (ConceptAccession acc : concept1.getConceptAccessions().toCollection()) { if (acc.getElementOf().equals(namespace)) { suitableAccessions.add(acc); } } ONDEXConcept concept2 = graph.getConcept(cid2); //check if the set contains any of the accessions of concept 2 for (ConceptAccession acc : concept1.getConceptAccessions().toCollection()) { if (suitableAccessions.contains(acc) { //if found, return it return acc; } } return null; }
Creating concepts and relations:
Concepts and relations can be created on a graph using the 'EntityFactory' class, which is accessible over the graph's 'getFactory()' method. It provides a great number of different versions of 'createConcept()' and 'createRelation()' methods with different parameter sets to suit any given situation.
Example code snippet:
//fetch metadata from graph. CV cvUnknown = graph.getMetaData.getCV("unknown"); ConceptClass ccThing = graph.getMetaData.getConceptClass("Thing"); EvidenceType etIMPD = graph.getMetaData.getEvidenceType("IMPD"); RelationType rt = graph.getMetaData.getRelationType("r"); //create two "Thing" concepts ONDEXConcept c1 = graph.getFactory().createConcept("test1", cvUnknown, ccThing, etIMPD); ONDEXConcept c2 = graph.getFactory().createConcept("test2", cvUnknown, ccThing, etIMPD); //connect them with a "related" relation ONDEXRelation r1 = graph.getFactory().createRelation(c1, c2, r, etIMPD);
Concept classes, relation types, CVs, evidence types and attribute
names are the so called metadata of the graph. They all inherit the
super-interface 'MetaData' with its three methods 'getId()', 'getFullname()' and 'getDescription()',
all of which return strings. Additionally, concept classes, relation
types and attribute names are able to form specialization hierarchies.
The respective parent entity is returned by the 'getSpecializationOf()' method.
All metadata is accessible using the graph's getMetaData() method, which returns an object of type 'ONDEXGraphMetaData'. ONDEXGraphMetaData features a great number of different methods to create, fetch and delete parts of any of the different kinds of metadata.
The Ondex workflow engine initializes its graphs using a standard metadata set which is parsed from a file in the data/xml/ subdirectory of your local Ondex installation. The current version of that file can be found in the SVN repository. You can edit it conveniently using the Ondex metadata editor, which can be downloaded from the Nexus server.
'RelationType'.
In addition to the previously described methods it also offers some
further getters and setters which can be used to manipulate the
semantics of a type. 'is_part_of' would be 'has_part'.is_part_of B and B is_part_of C then A is_part_of C. More information can be found |here.equals relation. Every entity is always equal to itself. More information is available here.equals B then B equals A. More information can be found here'is_greater_than' B then B cannot be greater than A.'AttributeName'.
In addition to the generic methods for the retrival and manipulation of
id, name and description it provides getters and setters for the
associated Java type and unitIn the likely case that you are developing an Ondex workflow plug-in, you will not need to create the graph object yourself, as it is provided to each plug-in by the workflow engine. You can access it over a protected field, that every plug-in possesses:
protected ONDEXGraph graph;
Otherwise, if you want to develop an independent software tool that
uses an Ondex graph, you will need to create a graph object. As 'ONDEXGraph' is just an interface, like the rest of the API, you will need to employ an implementation of the API to do so. See the implementations chapter for more information.
Whichever graph implementation you decide on, you need to note the
following: Before you instantiate a graph, you should make sure that
the JVM system properties contains the entry ondex.dir, pointing to your Ondex data directory. The system requires it to read the log4j.config and config.xml
files in order to set up and configure the logging and security
subsystems. This property can be set anywhere in your code as long as
it is executed before the graph is created.
To enable logging on your graph, create a new 'net.souceforge.ondex.logging.ONDEXCoreLogger' and add it to your graph.
In most cases the 'memory' implementation of the graph is the best choice. It can be can be found in the artifact net.sourceforge.ondex:memory (available from the Rothamsted Research Nexus server). You will find the 'MemoryGraph' implementing classes in the package net.sourceforge.ondex.core.memory.
Example code snippet:
//set the data directory location System.setProperty("ondex.dir","/home/foobar/ondex/data"); //create a new logger ONDEXCoreLogger logger = new ONDEXCoreLogger(); //create the graph ONDEXGraph graph = new MemoryGraph("My graph", logger);
If you choose to employ the BerkeleyDB persistent implementation instead, you will require the net.sourceforge.ondex:berkeley artifact (also available from the Rothamsted Research Nexus server). You will find the required class 'BerkeleyEnv' in the package net.sourceforge.ondex.core.persistent.
The Berkeley environment class will create the graph for you. You can access it using the 'getAbstractONDEXGraph()'
method. Another important aspect of the persistent implementation is
that it requires to be shut down safely before the JVM is terminated.
This is done by calling the 'cleanup()' method on the berkeley environment object.
Example code snippet:
//set the data directory location System.setProperty("ondex.dir","/home/foobar/ondex/data"); //create a new logger ONDEXCoreLogger logger = new ONDEXCoreLogger(); //set the location where the database should be created String dbLocation = "/home/foobar/berkeley/"; //create a new berkeley environment BerkeleyEnv env = new BerkeleyEnv(dbLocation, "My graph", logger); //extract the graph from the berkeley environment ONDEXGraph graph = env.getAbstractONDEXGraph(); //do something with your graph here //before your application exits you should cleanup the database. env.cleanup();
Apart from standard binary relations that connect one concept with
another, Ondex is also capable of modelling so-called 'trinary
relatinons'. In addition to the 'from' and 'to' concepts they reference
a 'qualifier' concept. It can be queried from the relation using its 'getQualifier()' method. If none exists (i.e. the relation is binary) the method will simply return 'null'.
Example code snippet:
/** * Example method returning whether a relation is * trinary or not */ public boolean isRelationTrinary(int relationID) { ONDEXRelation relation = graph.getRelation(relationID); return relation.getQualifier() != null; }
Another advanced aspect of the Ondex graph API
is the notion of 'contexts'. Any concept or relation can be part of one
or more contexts. For example, given a graph of a metabolic network,
all enzymes, metabolites and their connecting relations that form the
phenylalanine synthesis pathway could be declared to be part of the context
of the phenylalanine synthesis pathway. To make concepts and relations
part of a context they are simply tagged with the context's identifier.
A peculiar gimmick about the realisation of this feature is that a
context identifier itself is an Ondex concept. So in the above example
a context of type 'Pathway'
named 'phenylalanine synthesis' would serve as the context identifier.
This allows information about the context itself to be stored in Ondex.
ONDEXConcept and ONDEXRelation both have the following three methods for maintaining contexts:
addContext(ONDEXConcept)removeContext(ONDEXConcept)ONDEXView<ONDEXConcept> getContext()
In addition, the ONDEXGraph can be queried for the contents of any context using its methods 'ONDEXView<ONDEXConcept> getConceptsOfContext(ONDEXConcept)' and 'ONDEXView<ONDEXRelation> getRelationsOfContext(ONDEXConcept)'.
Example code snippet:
/** * This example method gets the contents of both context1 and context2 and adds the intersection * of both sets to context3 */ public markContextIntersection(ONDEXConcept context1, ONDEXConcept context2, ONDEXConcept context3) { //get all concepts that belong to contexts 1 and 2 ONDEXView<ONDEXConcept> conceptOfContext1 = graph.getConceptsOfContext(context1); ONDEXView<ONDEXConcept> conceptOfContext2 = graph.getConceptsOfContext(context2); //intersect the two sets ONDEXView<ONDEXConcept> conceptOfContext3 = ONDEXViewFunctions.and(conceptOfContext1, conceptsOfContext2); //don't forget to close the iterators. conceptOfContext1.close(); conceptOfContext2.close(); //get all relations that belong to contexts 1 and 2 ONDEXView<ONDEXRelation> relationsOfContext1 = graph.getRelationsOfContext(context1); ONDEXView<ONDEXRelation> relationsOfContext2 = graph.getRelationsOfContext(context2); //intersect them too ONDEXView<ONDEXRelation> relationsOfContext2 = ONDEXViewFunctions.and(relationsOfContext1, relationsOfContext2); //close their iterators too relationsOfContext1.close(); relationsOfContext2.close(); //iterate over the concepts that are the intersection of 1 and 2 while (conceptsOfContext3.hasNext()) { ONDEXConcept concept = conceptsOfContext3.next(); //and mark them as part of context 3 concept.addContext(context3); } //close the intersected iterator conceptsOfContext3.close(); //iterate over the relations that are the intersection of 1 and 2 while (relationsOfContext3.hasNext()) { ONDEXRelation relation = relationOfContext3.next(); //and mark them, too, as part of context 3 relation.addContext(context3); } //close the intersected iterator relationsOfContext3.close(); }
Ondex uses an access permissions system very similar to the one found on Unix file systems. Users and groups can be created and can have different access levels on Ondex graphs and their contents. Every JVM thread that runs on the Ondex system belongs to a certain user. This is realized by assigning a session object holding the the user's credentials to the respective thread. This way different threads belonging to different users can run on the same JVM, accessing the same set of graphs according to their respective permissions.
To find out what user is logged into the current thread you can use the static method 'Session.getSessionForThread()'. This will return a 'Session' object, which can be used to find out the current user id and group id. The methods getUID() and getGID() will do that.
To login a different user onto the current thread you will of course require the user name and password. Use the static method 'Session.getSession(String uname, String pwd)' to retrieve the corresponding Session, then associate it with the current thread using 'Session.setSessionForThread(Session s)'.
Permissions on Ondex graphs are handled on a per-element basis. That
is, every concept, every relation, every metadata element and the graph
itself can determine independently from one another, exactly what user
has what permission on them. This strongly depends on the owner
of the element. Each element has a specific owner (i.e. an associated
user id) and a specific group (i.e. an associated group id, usually the
group in which the owner participates). The owner can define three
different access scopes on each of his elements, modeled by the Enum 'net.sourceforge.ondex.core.security.perm.AccessScope':
USER: The access permission that the owner has himselfGROUP: The access permission that all members of the associated group have.OTHER: The access permission that everyone else has.
There are five different permission levels, modeled by the Enum 'net.sourceforge.ondex.core.security.perm.Permission':
NONE: The user has no access on the elementGET: The user can retrieve information from the elementCREATE: The user can retrieve and add information from the elementUPDATE: The user can retrieve and add elements and associated attributes and can also edit those associated attributesDELETE: The user can retrieve, add, modify or delete elements and associated attributes.
Whenever a method on a graph, a concept or a relation is called, Ondex
checks whether the user associated with the executing thread has the
permission to do so. For example. A thread associated with user 'foo'
calls the method 'getConceptAccessions()' on the concept 'bar'. As this
method requires permission level 'GET', Ondex will check whether 'foo' has the 'GET' permission on 'bar'. If he does, the method will perform as usual; if not, the method will throw an 'AccessDeniedException'.
To query or alter the settings for a certain graph element the class 'net.sourceforge.ondex.security.perm.GlobalPermissions'. To access the GlobalPermsissions object for the graph you are working on use the method 'getInstance(long forSID)' with the graph's SID:
GlobalPermissions gp = GlobalPermissions.getInstance(graph.getSID());
The object itself features a great number of methods for accessing and altering permissions on any element of the concerned graph. However, only the owner of the respective element is allowed to change its permission settings. To save space on this tutorial, only the methods operating on concepts are demonstrated here. The remaining set of methods works very similarly and is documented in the JavaDocs.
int getConceptOwnerUserID(int cid)int getConceptOwnerGroupID(int cid)Permission getConceptPermissionLevel(int cid, AccessScope scope)setConceptOwner(int cid, int uid)setConceptGroup(int cid, int gid)setConceptPermission(int cid, AccessScope scope, Permission permission)Example code snippet:
/** * An example method that iterates over all concepts owned * by the current user and sets the permissions for everyone * else to level NONE. */ public void restrictAccessOnMyConcepts(ONDEXGraph graph) { //get permissions object GlobalPermissions gp = GlobalPermissions.getInstance(graph.getSID); //retrive own user id int myUserID = Session.getSessionForThread().getUID(); //iterate over all concepts ONDEXView<ONDEXConcept> concepts = graph.getConcepts(); while(concepts.hasNext()) ONDEXConcept concept = concepts.next(); //if current concept is owned by current user remove other permissions if (gp.getConceptOwnerUserID(concept.getId()) == myUserID) { gp.setConceptPermission(concept.getId(), AccessScope.GROUP, Permission.NONE); gp.setConceptPermission(concept.getId(), AccessScope.OTHER, Permission.NONE); } } //close the iterator concepts.close(); }
The Ondex API
comes with a hierarchy of exception types which are used throughout the
module. For convenience reasons, the types most frequently used in the
graph are runtime exceptions, thus do not need to be handled explicitly. All of these exception types can be found in the package 'net.sourceforge.ondex.exception.type'
On the one hand, there are exceptions that can be thrown by the graph API or its implementations. The most important of which is the 'AccessDeniedException',
which is thrown whenever method is called that tries to access a graph
or graph element on which the current user has no access. Furthermore,
the 'WrongParameterException' reports that a parameter given to one of the API methods was invalid. Its three subclasses 'NullValueException', 'EmptyStringException' and 'ObjectTypeMismatchException' further specify in what way the given parameter was invalid. Finally, the 'DataLossException' indicates that a severe error occurred in the graph API's implementation. This may report disk failures, database errors or similarly unpleasant events.
On the other hand, there are pre-defined exception types that you can
use in your own code to notify higher levels of the system architecture
of problems in communicating with the API.
The contained classes are especially useful for the development of new
workflow plug-ins. All of them are subtypes of the class 'PluginException' so they can easily be caught by higher level execution layers such as the Ondex workflow engine.
[Still to write]