Matching heterogeneous data sources using PhysioSpace

Positioning new experiments in context of the increasing amount of experimental and clinical data is one of the fundamental tasks in computational biology. Unsupervised methods make use of the full experimental knowledge but are limited by the need to process all the data together and that outcome is also influenced by the new data. Popular supervised Method use abstractions like signatures or gene-sets to arrive at a statistical score and are considered to be more robust against a change of platform or laboratory.

Our approach aims to unify the implementation of Gene set enrichment and signature association methods utilizing massive prior knowledge. The prior knowledge is represented in a so-called physiospace, which derived from large sets of heterogeneous public data sources in unsupervised manner.  The physiospace provides patterns which are specific for phenotypes with low sensitivity on experimental protocols. On the basis of the physiospace we develop non-parametric tests for signature association from cellular lab experiments to clinically relevant phenotypes. So the physiospace makes it possible to monitor lab experiments on the background of the vast amount of available physiological data representing the heterogeneity and high variability of the systems in biomedicine.