Bio2Vec — neuro-symbolic AI & machine learning with ontologies

01 · Methods

Methods & foundations

Our toolbox for learning from ontologies: the mOWL library and the embedding methods it implements — from geometric models of description logics to graph- and corpus-based representations.

Library & foundations

Our work centres on one idea: ontologies and knowledge graphs are background knowledge that machine-learning models can learn from. mOWL packages the methods below behind a single API, and our tutorials teach the underlying techniques.

mOWL

A Python library for machine learning with ontologies. mOWL maps ontology classes, relations, and instances into vector spaces while preserving the logical axioms, unifying graph-based, syntactic, and model-theoretic embeddings behind one API with direct access to the OWL API and automated reasoning from Python.

Code Paper · Bioinformatics 2023

Machine Learning with Ontologies

Companion code and notebooks for our review of semantic similarity and ontology-based machine learning, reproducing benchmark experiments across semantic similarity, Onto2Vec/OPA2Vec, graph embeddings, and EL Embeddings.

Code Paper · Brief. Bioinform. 2021

Ontology Tutorial

Our hands-on teaching materials on ontologies, automated reasoning, semantic similarity, and combining ontologies with deep learning, developed for courses and summer schools.

Code

Geometric ontology embeddings

These methods build vector spaces that are themselves approximate models of a description-logic theory, so geometry reflects logical entailment. They are the core of our neuro-symbolic research.

EL Embeddings

Geometric embeddings for the description logic EL++ that act as approximate models of the ontology: classes become n-balls and relations TransE-style translations, so subsumption, conjunction, and disjointness are enforced geometrically.

Code Paper · IJCAI 2019

ELBE / EL2Box

Box-shaped EL++ embeddings. Representing concepts as axis-parallel boxes means the intersection of two concepts is again a box, giving the exact intersectional closure that ball-based methods cannot achieve.

Code Paper · arXiv 2022

catE

Lattice-preserving embeddings for the more expressive logic ALC, which supports full negation and universal restrictions. A category-theoretic construction materialises the ontology's concept lattice and embeds it order-preservingly.

Code Paper · NeSy 2024

DELE

Deductive EL++ embeddings: the ontology's deductive closure is folded into training and evaluation, with negative sampling that avoids treating entailed axioms as negatives.

Code Paper · Neurosymbolic AI 2025

geometric_embeddings

Enhancing geometric EL++ embeddings with negative sampling and deductive-closure filtering, and exposing biases in how knowledge-base-completion benchmarks are framed.

Code Paper · NeSy 2024

GeometrE

Fully geometric multi-hop reasoning on knowledge graphs: every logical operation is a geometric transformation rather than a learned neural operator, with a transitive loss that preserves transitive relations.

Code Paper · ESWC 2026

Graph- & corpus-based embeddings

Our earliest embedding methods turn logical axioms and RDF graphs into corpora or graphs that representation learning can consume — the lineage that began with Onto2Vec and the “2vec” family.

Onto2Vec

Learns joint embeddings of ontology classes and annotated entities by treating logical axioms and their deductive closure as sentences for a Word2Vec model — the first method to apply representation learning to arbitrary OWL axioms.

Code Paper · Bioinformatics 2018

OPA2Vec

Extends Onto2Vec by adding the informal content of ontologies — labels, definitions, synonyms — and an optional literature-pretrained language model, yielding richer vectors for similarity-based prediction.

Code Paper · Bioinformatics 2019

DL2Vec

Converts description-logic axioms into a labelled graph and learns embeddings by random walks; combining phenotype, function, and anatomy ontologies, it links candidate genes to diseases.

Code Paper · Bioinformatics 2021

Walking RDF and OWL

Neuro-symbolic representation learning over RDF knowledge graphs and OWL ontologies: reason to the deductive closure, then run edge-labelled random walks and Word2Vec. The seed method behind much of our later work.

Code Paper · Bioinformatics 2017

Onto2Graph

Infers graph structures from OWL ontologies using automated reasoning, turning complex axioms into edges over the deductive closure for downstream graph analysis.

Code Paper · BMC Bioinformatics 2018

ontology-graph-projections

A systematic study of how different graph projections of ontologies (Onto2Graph, OWL2Vec*, RDF) shape the embeddings learned from them and their ability to infer axioms.

Code Paper · NeSy 2023

vec2SPARQL

Integrates SPARQL querying with vector-space operations, so a single query can mix graph patterns with embedding similarity and machine-learning functions.

Code Paper · SWAT4LS 2018

02 · Applications

Applications in biology & medicine

We put these representations to work on core problems — predicting protein function, ranking disease genes and variants, and uncovering drug and molecular interactions.

Protein function prediction — the DeepGO family

A decade of ontology-aware models for predicting Gene Ontology functions from protein sequence, each generation tightening the link between deep learning and the logical structure of GO.

DeepGO

Predicts Gene Ontology functions from protein sequence and interaction networks with a deep, ontology-aware classifier whose output layer mirrors the GO hierarchy.

Code Paper · Bioinformatics 2018

DeepGOPlus

Sequence-only function prediction combining a deep convolutional network over the sequence with homology-based annotation transfer; strong CAFA performance.

Code Paper · Bioinformatics 2020

DeepGOZero

Zero-shot function prediction: GO classes are grounded in their logical definitions via model-theoretic EL Embeddings, so functions with no training examples can still be predicted.

Code Paper · Bioinformatics 2022

DeepGO-SE

Frames function prediction as approximate semantic entailment over GO: protein language-model embeddings are evaluated in many approximate models of the GO theory and the truth values aggregated.

Code Paper · Nat. Mach. Intell. 2024

DeepGOMeta

DeepGO for microbial communities — retrained on prokaryotes, archaea, and phages and paired with a metagenomics pipeline for functional profiling.

Code Paper · Sci. Rep. 2024

PU-GO

Reformulates function prediction as positive-unlabelled ranking, deriving class priors from the GO hierarchy so undiscovered annotations are not penalised as negatives.

Code Paper · Bioinformatics 2024

GO-Agent

An LLM agent that predicts protein function as multi-step reasoning, cross-referencing sequence models, homology, literature, and GO axioms to refine and explain its predictions.

Code Paper · PSB 2026

Genomic context

Predicts bacterial protein function from genomic context alone, pre-training a BERT model over genomes treated as sequences of protein-cluster tokens.

Code Paper · bioRxiv 2024

Phenotype-based gene & variant prioritization

Connecting patient phenotypes to genomes. These tools reason and learn over cross-species phenotype ontologies to rank the variants and genes behind genetic disease.

PhenomeNET-VP

Prioritizes causative variants in exomes and genomes by combining molecular pathogenicity with phenotype similarity computed by reasoning over the PhenomeNET cross-species phenotype ontology (PVP, DeepPVP, OligoPVP).

Code Paper · PLOS Comput. Biol. 2017

DeepPheno

Predicts the abnormal phenotypes resulting from single-gene loss of function with an ontology-aware hierarchical classifier over the Human Phenotype Ontology.

Code Paper · PLOS Comput. Biol. 2020

DeepSVP

Prioritizes structural and copy-number variants by relating affected genes to patient phenotypes through ontology embeddings of function, expression, and anatomy.

Code Paper · Bioinformatics 2022

EmbedPVP

Prioritizes coding variants through neuro-symbolic, knowledge-enhanced learning, combining pathogenicity scores with phenotype, function, and anatomy knowledge across a choice of embedding methods.

Code Paper · Bioinformatics 2024

STARVar

Ranks candidate variants from free-text patient symptoms — not only HPO codes — by combining literature text-mining with genomic evidence.

Code Paper · BMC Bioinformatics 2023

INDIGENA

Inductive disease-gene prediction: learns graph embeddings of individual phenotypes and aggregates them on the fly, generalising to unseen diseases where transductive methods cannot.

Code Paper · Bioinformatics 2026

predCAN

Predicts cancer driver genes from biological background knowledge — cellular, functional, and knockout phenotypes embedded with OPA2Vec — rather than mutation frequency.

Code Paper · Sci. Rep. 2019

SMUDGE

Semantic disease-gene embeddings: builds vector representations of gene and disease phenotypes and propagates them to unannotated genes over an interaction network.

Code Paper · Bioinformatics 2018

Drug discovery & molecular interactions

Embedding biomedical knowledge graphs together with sequence and text to predict interactions among drugs, targets, diseases, and pathogens.

multi-drug-embedding

Predicts drug targets and indications by jointly embedding a biomedical knowledge graph and the published literature, combining structured and textual evidence.

Code Paper · PeerJ 2022

DeepViral

Predicts virus-host protein interactions from sequence together with infectious-disease phenotypes and protein functions grounded in ontologies.

Code Paper · Bioinformatics 2021

03 · Knowledge

Knowledge representation & ontology quality

Neuro-symbolic methods are only as reliable as the ontologies beneath them, so we also build the representations and quality-control tools that keep those ontologies sound and tractable.

Knowledge representation & ontology quality

Neuro-symbolic methods are only as good as the ontologies beneath them. These tools keep large biomedical ontologies tractable to reason over and free of hidden contradictions.

OntoFunc

An EL++-compatible representation pattern for biological functions that keeps large-scale reasoning over functions tractable, with tooling for function-based ontology analysis.

Code Paper · FOIS 2016

UNMIREOT

Detects, explains, and semi-automatically repairs hidden contradictions that surface when biomedical ontologies are combined — finding that a handful of axioms cause widespread incoherence across the OBO Foundry.

Code Paper · BMC Med. Inform. 2020

04 · Services

Live services & endpoints

Several of our methods run as hosted web services and public APIs you can use directly, without installing anything.

DeepGO → Web server for deep, ontology-aware prediction of protein function from sequence. deepgo.bio2vec.net AberOWL → Ontology repository offering OWL EL reasoning as a service and semantic search. aber-owl.net SIDEKICK → Drug-safety knowledge graph combining LLM extraction with Graph-RAG over ontologies. sidekick.bio2vec.net PAVS → Phenotype-and-variant knowledge graph with a public SPARQL endpoint. pavs.phenomebrowser.net PathoPhenoDB → Database of pathogens and the disease phenotypes they cause. patho.phenomebrowser.net

Machine learning that reasons with biomedical knowledge.

Methods & foundations

Library & foundations

mOWL

Machine Learning with Ontologies

Ontology Tutorial

Geometric ontology embeddings

EL Embeddings

ELBE / EL2Box

catE

DELE

geometric_embeddings

GeometrE

Graph- & corpus-based embeddings

Onto2Vec

OPA2Vec

DL2Vec

Walking RDF and OWL

Onto2Graph

ontology-graph-projections

vec2SPARQL

Applications in biology & medicine

Protein function prediction — the DeepGO family

DeepGO

DeepGOPlus

DeepGOZero

DeepGO-SE

DeepGOMeta

PU-GO

GO-Agent

Genomic context

Phenotype-based gene & variant prioritization

PhenomeNET-VP

DeepPheno

DeepSVP

EmbedPVP

STARVar

INDIGENA

predCAN

SMUDGE

Drug discovery & molecular interactions

multi-drug-embedding

DeepViral

Knowledge representation & ontology quality

Knowledge representation & ontology quality

OntoFunc

UNMIREOT

Live services & endpoints