From AI to real life.

Our team invested a decade creating and testing artificial intelligence pipelines to confirm whether they work... and they do!

madi protein

Performance

Several algorithms have been developed to uncover the underlying constraints of the evolutionary process, which can be used to infer the tolerability and favorability of mutations. However, these methods lack accuracy, time responses and high investments in protein engineering campaigns as they don’t account for the fitness data of tested variants during directed evolution. Recent research has revealed signals in evolutionary information such as Multiple Sequence Alignments (MSAs) for homologous and analogous proteins.

This motivates the integration of evolutionary information from protein sequences to guide supervised models that predict intrinsic protein features and fitness of protein variants under specific conditions, thus enhancing the efficacy of protein engineering campaigns and more real-time learnings to users.

Case #1

Protein fitness benchmark

madi™ uses deep learning architectures that predict protein features and then optimize them. These architectures were tested against the state-of-art in an extensive benchmark, achieving better performance in most conditions.

content-0
content-1

Spearman correlation: madi™ vs State of the art model for multiple protein engineering tasks. madi™ performs better on 68 out of 72 independent experiments. An average spearman correlation of 0.65 and 0.45 can be seen, between madi™and state of the art, respectively.

Case #2

Performance improvement using high-order mutations

madi™ can be trained from deep mutational scanning experiments of single point mutations to extrapolate into high-order mutations (more than 2-3 mutations per variant). In this case, the algorithm was used to optimize the fluorescence of the known GFP protein.

content-0

The top variants achieved an activity 4 times greater than the wild-type.

Discovery

madi’s protein discovery implementation utilizes non-supervised and supervised deep learning architectures to create embedded representations of protein sequences. These embeddings accurately preserve sequential, chemical, and structural information, resulting in faster searches than current algorithms. Remarkably, our AI architecture can detect functional patterns in very-low homology proteins, even in those with less than 30% sequence identity. This unique property emerges from the well-preserved information in our embeddings, which is crucial since protein function is determined by its structure, and structure possibilities are limited by functionality (protein structural and functional analogy).

Case #1

Low identity analogous proteins

With a well-known enzyme as a starting point, madi™ discovered low-identity functional proteins. The top candidates obtained with the model were tested in the laboratory to confirm activity.

content-0
content-1

Analogous enzymes were found with as little as 41% of sequence identity while retaining activity.

Case #2

Discovery of high-performing functional proteins

The algorithm discovered previously unknown better-performing proteins from a known antifungal protein with low performance in the target application, which was then experimentally validated.

content-0

Analogous enzymes were found with as little as 39% of sequence identity while retaining activity.

Production

madi’s protein production prediction engine uses deep learning techniques to predict the expression levels of a protein in specific microbial hosts. The model processes protein sequence and structure data and is trained on a large dataset of recombinant protein expression. The method is optimized for comparing the expression levels of multiple proteins in the same system, allowing for quick optimization to maximize recombinant expression. This revolutionary technology has the potential to drive and unleash a new protein development era.

Case #1

Protein production optimized using AI

With a well-known enzyme as starting point, madi™ discovered low-identity functional proteins. The top candidates obtained with the model were tested in the laboratory to confirm activity.

content-0
content-1

madi™ was tested against state of the art models on protein expression prediction, achieving a better accuracy for all tests, as well as a faster inference time.

The model is also able to generalize beyond single proteins, giving it adaptability to different protein engineering projects.

Our publications

Our papers

Our patents

Ready to join the protein revolution?