LIFE @ hulilab.org

Machine Learning-Based Invariant Feature Engineering (LIFE)

Summary ^

Symmetry refers to properties that remain invariant upon mathematical transformations, yet it remains unexplored in biology and medicine. We set to explore symmetry relationships in gene expression to distinguish between healthy and disease states. We hypothesize that there are relationships between gene expressions that remain invariant across individuals displaying the same biological phenotype. Our Gene Expression Symmetry Hypothesis (GESH) posits that the invariant nature of phenotypic traits in cells is defined by a set of genes exhibiting specific symmetric expression relationships. We deployed a hybrid machine learning approach implemented with two symmetric invariant feature functions (IFFs) to identify Invariant Feature Genes (IFGs), which are gene pairs whose IFF single-value outputs remain invariant across individual samples in each phenotype. Our multiclass classification results across the transcriptomes of 25 normal organs, 25 cancer types, and blood samples from 4 different types of neurodegenerative diseases identified unique fingerprints. We constructed networks from IFGs and found that cancer IF-Nets hubs were enriched with approved and clinical trial drugs, highlighting “symmetry breaking” as a novel treatment approach.

Symmetric gene expression relationships define a phenotypic trait.
(A) Key concepts for detecting symmetric gene expression relationships across individual samples under given a biological phenotype. Here, the expression values of gene pair serve as input for an invariant feature function (IFF1 and IFF2), a mathematical function that describes a symmetrical relationship between a gene pair. If a pair of genes exhibit symmetric gene expression relationship, its single-value outputs across samples obtained from different individuals belonging to the same biological phenotype shows small variability (i.e., a small standard deviation), in contrast to a gene pair that do not display symmetric expression relations. Gene pairs that exhibit symmetric expression relationships in a biological phenotype are called Invariant Feature Genes (IFGs) and they can serve as fingerprints to define a specific biological phenotype. (B) Overall design of Machine Learning-Based Invariant Feature Engineering (LIFE). Two different invariant feature functions, IFF1 and IFF2, are incorporated into LIFE. Transcriptomics data from 25 normal organs and 25 cancer types, and blood transcriptomics across four neurological disorders were used in this proof-of-principle studies to explore IFGs, and their power to classify phenotypes. Multi-class classification with cross-validation were used to assess IFGs as phenotype-specific fingerprints that can robustly assign individual samples to their respective biological phenotypes. In addition, disease-specific networks constructed from IFGs (IF-Nets) are a valuable platform for drug target discovery. AD: Alzheimer's disease; MCI: mild cognitive impair; IPD: idiopathic Parkinson’s disease; and ALS: amyotrophic lateral sclerosis.

Machine Learning-Based Invariant Feature Engineering (LIFE)

Summary ^

Results ^

Download ^

Support ^

Citation ^