A Multimodal Transformer Network for Protein-small Molecule Interactions Enhances Predictions of Kinase Inhibition and Enzyme-substrate Relationships

Alexander Kroll, Sahasra Ranjan, Martin J. Lercher

Abstract

The activities of most enzymes and drugs depend on interactions between proteins and small molecules. Accurate prediction of these interactions could greatly accelerate pharmaceutical and biotechnological research. Current machine learning models designed for this task have a limited ability to generalize beyond the proteins used for training. This limitation is likely due to a lack of information exchange between the protein and the small molecule during the generation of the required numerical representations. Here, we introduce ProSmith, a machine learning framework that employs a multimodal Transformer Network to simultaneously process protein amino acid sequences and small molecule strings in the same input. This approach facilitates the exchange of all relevant information between the two molecule types during the computation of their numerical representations, allowing the model to account for their structural and functional interactions. Our final model combines gradient boosting predictions based on the resulting multimodal Transformer Network with independent predictions based on separate deep learning representations of the proteins and small molecules. The resulting predictions outperform recently published state-of-the-art models for predicting protein-small molecule interactions across three diverse tasks: predicting kinase inhibitions; inferring potential substrates for enzymes; and predicting Michaelis constants KM. The Python code provided can be used to easily implement and improve machine learning predictions involving arbitrary protein-small molecule interactions.

Introduction

Predicting interactions between proteins and small molecules is a long-standing challenge in biological and medical research, and it plays a crucial role in the discovery of new drugs and in the understanding of their action [1–12]. Moreover, they are essential for predicting enzyme kinetic parameters [13–17] and for inferring diverse protein functions, such as the binding between enzymes and their substrates [18–26]. The main obstacle to achieving such predictions lies in generating effective numerical representations of the two molecule types that encode all the information relevant to the underlying task. Ideally, these numerical representations should already incorporate information on the molecular and functional interactions between proteins and small molecules. Some recent models have made efforts to address this challenge by facilitating the exchange of information between the small molecule representation and the protein representation during their generation [2, 7, 12]. While these approaches offer the potential to better capture the interplay between the two modalities, they still cannot capture the full complexity of protein-small molecule interactions.

Methods
Implementation details

All software was coded in Python [56]. We implemented the multimodal Transformer Network in PyTorch [57]. We fitted the gradient boosting models using the library XGBoost [39].

Calculation of protein token embeddings

We use protein amino acid sequences to represent proteins in the input of the ProSmith Transformer model. Every amino acid in a sequence is represented through a separate token. To numerically encode information about the token, we used learned representations from the ESM-1b model, a Transformer Network with 33 layers that was trained on ∼27 million protein sequences [28]. We applied the trained ESM-1b model to each protein amino acid sequence and extracted the updated 1280-dimensional token representations from the last layer of the model.

Results

Architecture of the multimodal Transformer Network for the prediction of protein-small molecule interactions

As its input, the ProSmith model takes two modalities, a protein amino acid sequence together with a SMILES string that represents the molecular structure of the small molecule. When constructing multimodal Transformer Networks, different architectural designs can be considered, each differing in how information between the modalities is incorporated [34]. This subsection provides a detailed description of the architectural choices made in the development of ProSmith.

Discussion

In this study, we introduce ProSmith, a novel machine learning framework for predicting the interactions between proteins and small molecules. The main methodological advance of our model is the utilization of a multimodal Transformer Network that can effectively process amino acid sequences of proteins and SMILES string representations of small molecules within the same input sequence (Fig 1). The ability of this model architecture to incorporate information on the interaction between a protein and a small molecule during the generation of the corresponding numerical representation leads to a superior ability of the trained model to predict drug-target interactions for target protein kinases dissimilar to kinases included in the training data. ProSmith also outperforms previous state-of-the-art methods in predicting protein-small molecule interactions for two different tasks of high relevance to biomedical, biotechnological, and biological research: predicting enzyme substrates and of enzyme-substrate affinities.

Acknowledgments

We thank Martin K. M. Engqvist for interesting discussions on multimodal Transformer Networks. Computational support and infrastructure was provided by the “Centre for Information and Media Technology” (ZIM) at Heinrich Heine University Düsseldorf, Germany.

Citation: Kroll A, Ranjan S, Lercher MJ (2024) A multimodal Transformer Network for protein-small molecule interactions enhances predictions of kinase inhibition and enzyme-substrate relationships. PLoS Comput Biol 20(5): e1012100. https://doi.org/10.1371/journal.pcbi.1012100

Editor: Jeffrey Skolnick, Georgia Institute of Technology, UNITED STATES

Received: January 8, 2024; Accepted: April 24, 2024; Published: May 20, 2024

Copyright: © 2024 Kroll et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All datasets are available from Github at https://github.com/AlexanderKroll/ProSmith and from the Zenodo database at https://doi.org/10.5281/zenodo.8182031.

Funding: This work was funded through grants to MJL by the European Union (ERC AdG “MechSys”–Project ID 101055141) and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation: CRC 1310, and, under Germany’s Excellence Strategy, EXC 2048/1–Project ID675390686111). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012100#abstract0

Thermo Fisher Scientific viral vector services (VVS)