Protein Sciences in Drug Discovery 2022

Predicting protein expressibility and solubility using protein language models

Thu3 Nov03:05pm(30 mins)
Where:
The Auditorium
Speaker:

Abstract

During the process of protein production, expressibility and solubility are often limiting factors. Existing strategies to optimize protein production are time-consuming, such as adjusting experimental setup and codon optimization. Tools that predict expressibility and solubility for protein purification directly from the protein sequence are needed. However, existing predictors are often built on biased datasets and tuned only for the expression host Escherichia coli, ignoring other industrially important hosts such as Bacillus subtilis. We have shown that deep learning protein language models can learn statistical representations, which can then be used to select protein sequence candidates with high solubility and expression potential. In one study, we built a B. subtilis-specific tool to infer the likelihood of successful overexpression, which is able to prioritize protein sequences by extracting features related to expression, despite achieving only modest performance values. In a second study, we showed that several existing solubility predictors for E. coli were built on biased data and could not generalize well across multiple datasets. Instead, we introduced a new tool named NetSolP that achieved state-of-the-art performances on curated existing datasets. Our work shows the potential of language models to accelerate protein production.

Hosted By

ELRIG

The European Laboratory Research & Innovation Group Our Vision : To provide outstanding, leading edge knowledge to the life sciences community on an open access basis

Get the App

Get this event information on your mobile by
going to the Apple or Google Store and search for 'myEventflo'
iPhone App
Android App
www.myeventflo.com/2424