From data to knowledge - the Ondex System for integrating Life Sciences data sources
The Ondex SABR project (BB/F006039/1) was funded by the Biotechnology and Biological Sciences Research Council (BBSRC) for 3 years starting on April 1st 2008 under the SABR Systems Biology initiative to create an e-tool project for supporting systems biology research. It was a collaboration between Rothamsted Research, The University of Manchester and Newcastle University. See information on our collaborators. This project addressed the problem that a prerequisite to a systems approach to biological research (SABR) is the integration and analysis of heterogeneous experimental data, which are stored in hundreds of life-science databases and millions of scientific publications.Validation of the project software will be achieved by providing direct support to three research challenges from systems biology:
In addition, other biological projects will be supported through existing collaborations and by developing new ones through outreach activities, including:
All these biological application cases have a common requirement for integration of a wide variety of datasets and this project has been established to demonstrate that they can be supported using the proposed developments to the Ondex system. These developments build on and combines four major established components from leading experts:
The Ondex system stores data as a graph of Concepts and relations.
Concepts represent data entities and relations link these entities together.
Additional semantic annotation is added using concept classes, relation types, evidences and controlled vocabularies.
Data is imported by data source specific parsers.
Mapping methods create new Relations between Concepts.
Local and global consistency checks are performed.
Data integration can be configured and executed using web services via Taverna
The Ondex system is open source and written in Java.
A wide range of biological applications can be addressed by setting up problem specific data integration and analysis workflows. Even though a prototype, the current system has been used:
Both ONDEX and Taverna are Open Source and are freely available to academic and commercial researchers. NaCTeMs text mining services are freely available to the UK academic community.
The aim of the project is to build on the success of the ONDEX prototype, and to create a robust, fully featured, extensible, easy to use and professionally-supported e-tool that will underpin systems biology projects in the UK.
We propose to achieve this by:
These technological developments will make it possible to address a wide range of new biological problems. The Ondex system will provide data integration support not only to the BBSRC Systems Biology Centres (see Biological applications) but also to a range of other systems biology projects that will be supported through the outreach activities.
Mining Candidate Gene Networks From Genetic Studies of Crops and Animals
The QTLNetMiner project is a spinoff from the Ondex SABR project which re-uses components of the Ondex data integration framework and data visualisation tools to create a specialised resource for researchers working on complex traits plants and animals. QTLNetMiner has been designed as a web-based resource that supports the identification and prioritization of candidate functional genes using evidence from:
The achievements of the project include the refinement of the Ondex graph visualisation tool as a Java applet known as OndexWeb, the development of indexing methods for the Ondex knowledge graph so that it can deliver results interactively to the client applications and the creation of a method to rank genes based on the network of evidence that supports them as being functionally related to the trait terms used in the query.
A QTLNetMiner user is presented with a simple to use query and visualisation interface supporting several different views which present the sets of candidate genes so that the source and quality of the evidence that relates a trait term or set of terms by traversal of the knowledge graph can be explored. These include:
The Ondex knowledge graph traversal and semantic motif query methods used by QTLNetMiner were developed by Matthew Hindle as part of this PhD research at Rothamsted.
The QTLNetMiner project (BB/I023860/1) was originally funded by the (BBSRC) as a TRDF project for 12 months from Jan 2011-2012. It was a collaboration between Rothamsted Research and the Roslin Institute to demonstrate that the methods and software framework were sufficiently general to be used in crop plant and livestock animal studies. Since then, collaborations with the Feingold laboratory in INTA (Argentina) and with Uwe Sholtz from IPK in Gatersleben have enabled us to further develop the software and implement knowledge bases for Tomato, Solanacae and Barley species respectively.
Ondex Networks for your Website
OndexWeb is a new web-based implementation of the network visualization and exploration toolse from the Ondex data integration platform. New features such as context-sensitive menus and annotation tools provide users with intuitive ways to explore and manipulate the appearance of heterogeneous biological networks. Ondex Web is open source, written in Java and can be easily embedded into web sites as an applet. Ondex Web supports loading data from a variety of network formats, such as XGMML, NWB, Pajek and OXL.
OndexWeb is a key user interface component used in the QTLNetMiner system.
Find out moreand try it at the OndexWeb website http://ondex.rothamsted.ac.uk/OndexWeb.
Read our paper on what OndexWeb can do for youJan Taubert, Keywan Hassani-Pak, Minja Zorc, Christopher Rawlings (2013) Ondex Web: interactive web-based visualization and exploration of biological networks visualization and analysis Bioinformatics (2013) doi: 10.1093/bioinformatics/btt740 >
Accelerating Discovery by Mining and Visualising Integrated Chemogenomics Data
The motivation behind this project was to enable (in this case) plant scientists and discovery chemists to collaborate more effectively, exploring a shared knowledge base of molecular and chemical data relating to agrochemistry and biological processes. In many industrial lifescience organisations with a pipeline taking bioactive compounds to the market place, there is a problem of delivering research information and knowledge discovery tools that meet the requirements of both biologists and chemists. Both groups of scientists have a shared understanding of biochemical pathways and molecular interaction networks, providing an ideal basis for the development of data visualisation and data analysis methods to reveal new information from integrated genetic, biochemical and chemical data sets.
These requirements are generic to companies working on bioactive compounds, notably the agrichemistry, pharmaceutical and associated biotechnology SMEs providing goods and services to these companies.
The major developments to Ondex, now available in the December 2013 release, support the the analysis and visualisation of small molecule chemistry data. New interfaces have been developed to public bioactivity resources (e.g.such as ChEMBL, providing a link to target proteins. The methods for analysis and visualisations for small chemical compounds and their functional properties use the existing open source java based Chemistry Development Kit (CDK). An interface to the European protein structure databank resource PDBE enables Ondex users to visualise the 3D structure of the proteins held in the knowledge base. New parsers now support the import and integration of private data sets in standard chemistry data formats (e.g. SD files, SMILES and InChi).
A summary of these new features are below:
Example screenshots from the new features can be found here .
This project was funded by Technology Strategy Board - TSB (TP Number 5082-33372) and BBSRC (TS/I003707/1) from November 2012 until March 2013. The lead partner was Syngenta and we gratefully acknowledge the contributions from Mark Forster and Bob Vaughan.