Summary ^

We present a novel feature engineering platform called Learning-based Invariant Feature Engineering (LIFE). This framework aims to detect gene expression activities that are invariant (i.e., robust) regardless of genetic heterogeneity of individual patients in disease. LIFE is built upon Invariant Feature Hypothesis (IF-Hypothesis) that predicts the existence of disease-specific invariant genetic activities. The first hint of IF-Hypothesis comes from the observation of stable phenotypic traits in pathological cells under a specific disease type. This suggests that mechanisms working by constraining cellular activities into an allowable phenotypic space are at play regardless of the genetic heterogeneity backgrounds. Another clue emerges from machine learning tasks such as in image recognition where robust features are detected from input data. For instance, individual measurements of facial objects such as eyes, noses, and mouths vary among individuals, but once these are normalized using the size of the face for each individual, unique human invariant facial features emerge.

The LIFE framework encapsulates these principles by using two non-correlative mathematical invariant feature functions, IF-Func1 and IF-Func2, that describe proportional and reciprocal mode of biological interactions respectively. In a proof-of-concept study, we illustrate the general existence of invariant genetic activities that impose biological constraints in diverse disease types by employing LIFE on 12 different cancer types.

IF-Net schematic

Overview of Learning-based Invariant Feature Engineering (LIFE) platform. (A) schematic of LIFE components that involve the detection of invariant features and downstream classifier, invariance fingerprint, and invariant feature network (IF-Net) construction. Invariant features are identified from two invariant feature functions, IF-Func1 and IF-Func2 based on non-correlated proportional and reciprocal mode of interactions, respectively. (B) Illustration of proportional coupled interaction mode of invariant gene pairs used to design IF-Func1. (C) Illustration of reciprocal coupled interaction mode of invariant gene pairs used to design IF-Func2.

Our work showed that every cancer type has its own set of invariant features, i.e., gene pairs that their expression values when transformed by either IF-Func1 or IF-Func2 yield single-value outputs with “invariance” across individual patients suffering from the same disease. We also demonstrate that networks composed of invariant feature genes with respect to 12 different cancer types (IF-Nets) and demonstrated that their hub genes are “information encoders” since their expression values are sufficient to reconstruct expression values of remaining genes in each individual IF-Net. Since invariant features are globally constrained across individual patients in a disease state, targeting the activities of these biologically constrained genes will dismantle the stability and functional fitness of disease cells, thus will confer great pharmacological benefits. By mapping FDA-approved and clinical trial drugs onto IF-Nets, we found a substantial number of drug targets are especially enriched at hub genes, further supporting great clinical values of invariant feature genes, especially IF-Net hub genes as promising drug intervention targets.

LIFE offers a paradigm shift to explore disease etiology from a biological constraint perspective. Our approaches also pave a new strategy for drug target discovery by focusing on highly constrained genetic activities in disease as potential intervening candidates instead of mutated genes as druggable targets. Such strategy has the potential to overcome drugs inefficacies and prevent a mutations arms race.

Results ^

Explore drug target networks of IFFUN1 and IFFUN2 here.

Download ^

comming soon.

Support ^

comming soon

Citation ^

Manuscript in preparation.

© 2016 H Li • All Rights Reserved