Science Spyglass
Designing a library of PPI modulators with Generative Artificial Intelligence
Why Generative AI for PPI Modulators?
Protein–protein interactions (PPIs) play a central role in the regulation of cellular signaling, structural organization, and enzymatic function. Dysregulation of specific PPIs has been implicated in a wide range of pathologies, including oncogenesis, neurodegenerative disorders, and infectious diseases.
Despite their biological relevance, PPIs have long posed a challenge for small-molecule drug discovery due to their typically large, shallow, and hydrophobic binding interfaces that often involve dispersed hot spots making it difficult to achieve high-affinity and selective binding with conventional small molecules. However, advances in structural biology and computational modeling have begun to redefine the tractability of PPIs.
Among the most promising developments in this space is the application of generative artificial intelligence (Generative AI), which offers powerful new capabilities to explore chemical space and maximize the chance of binding at protein-protein interfaces.

Figure 1. Example of protein-protein interaction (PPI) molecules binding at the interface of two protein units.

Below we describe our application of Generative AI in designing a library of PPI modulators, called the AXXPPI library. Furthermore, important AI concepts are briefly described at the end of the article.
Data curation for PPI modulator design
High-quality data is foundational to the success of any machine learning–driven drug discovery project, and this is especially true when targeting complex systems like protein–protein interactions. Accurate, well-curated datasets enable models to learn meaningful structure–activity relationships and generate chemically plausible, target-relevant molecules.
For this project, we assembled a diverse and representative dataset of known PPI modulators by integrating compounds from several specialized sources, including 2P2Idb, iPPI-DB, Timbal, ChEMBL, and peer-reviewed literature. Each entry was carefully curated to ensure correct annotation of bioactivity, target interface, and molecular structure, enabling downstream modeling to focus on truly relevant chemical features. This rigorous data preparation step is essential not only for training robust models but also for minimizing bias and improving the translational relevance of the predicted molecules.
Finetuning a pretrained drug-like model
To explore the chemical space of PPI modulators, we employed de novo molecular generation using state-of-the-art generative modeling frameworks. As a starting point, we utilized a model pretrained on large, drug-like chemical libraries, which had learned the general rules of chemical syntax and structure.

Figure 2. Schematic representation of ideal molecules (yellow spheres) within focused (light grey) and general (dark grey) chemical space.
To adapt this model to PPI-relevant scaffolds and molecular features, we applied a transfer learning strategy by fine-tuning it on our curated dataset of known PPI modulators. This domain-specific refinement enabled the generative algorithm to focus on structural motifs, physicochemical properties, and topologies commonly found in PPI-active compounds within the drug-like space.
Nevertheless, further optimization of the fine-tuned model is required to directly incorporate additional design constraints such as synthetic tractability without compromising its ability to generate functionally relevant modulators with promising pharmacological profiles.
De novo generation of optimized molecules
To further guide the de novo generation of PPI modulators toward synthetically tractable and functionally relevant chemical space, we implemented reinforcement learning (RL) with multi-objective optimization. A key component of our reward strategy was a machine learning classifier specifically trained to distinguish PPI modulators from non-PPI compounds, providing a probabilistic measure of PPI-likeness for generated molecules. In addition to this classifier, we incorporated the Synthetic Accessibility Score (SAS) to penalize impractical structures.
The reinforcement learning algorithm ranked generated molecules based on Pareto efficiency across multiple objectives, ensuring a balance between synthetic tractability and predicted biological relevance. In this framework, the pretrained generative model served as the agent, i.e., the network being actively optimized, while the fine-tuned model acted as the prior, i.e. a non-trainable reference that contributes mutations and maintains the generation of chemically valid and PPI-relevant scaffolds.
This setup allowed us to efficiently navigate the multi-dimensional objective space and converge on high-quality candidate structures with an optimal trade-off between desirability and diversity.

Figure 3. Evaluation of the Machine Learning Classifier’s ability to discriminate known PPIs (not used during the training process – light green) from random compounds (light grey).
Filtering, evaluation and synthesis
Once molecules are generated, they must undergo a rigorous filtering and prioritization process before synthesis. This workflow follows a funnel-like approach, where large numbers of candidate compounds are progressively narrowed down.

Figure 4. Filtering funnel to select a dataset of molecules to be synthesized.
Initially, molecules are removed if they violate medicinal chemistry filters such as REOS or contain undesirable reactive groups. Additional filtering ensures that selected compounds are sufficiently distinct from the training set to avoid rediscovery, while also maximizing chemical diversity. Finally, compounds are evaluated for predicted ADMET properties to ensure acceptable pharmacokinetic and safety profiles.
Only a small, carefully selected subset of molecules meeting all these criteria proceeds to synthesis and experimental validation.
Use of the PPI library in Axxam HTS
A focused PPI compound library is a valuable resource for high-throughput screening (HTS) campaigns targeting challenging biological interfaces. Unlike traditional approaches to PPI-focused libraries, which often rely on large and lipophilic chemotypes with limited drug-like properties to disrupt or stabilize PPIs, our strategy was to combine the structural features required for engaging protein–protein interfaces with careful optimization of physicochemical and developability parameters. The result is a collection of compounds that retain the 3D topologies and interaction patterns relevant for PPI modulation, while maintaining chemical tractability and favorable drug-like characteristics. This makes the library uniquely positioned to deliver high-quality starting points for hit discovery campaigns in the PPI space, increasing their chance for success.
PPI-focused libraries are particularly valuable in therapeutic areas where dysregulated protein–protein interactions play a central role in pathogenic interactions. Key areas include:
- Oncology: Numerous cancer pathways are driven by aberrant PPIs, such as p53–MDM2, BCL-2 family interactions, and β-catenin–TCF. These interactions regulate cell cycle progression, apoptosis, and transcription, making them prime targets for small-molecule inhibitors that can restore normal signaling or induce cell death in tumor cells.
- Neurodegenerative diseases: PPIs play a crucial role in driving the process of protein aggregation observed in conditions such as Alzheimer’s and Parkinson’s diseases. Targeting PPIs can both disrupt the early assembly of toxic oligomers and interfere with the further aggregation steps that lead to insoluble fibrils.
- Infectious diseases: Pathogen entry and replication often depend on specific host–pathogen PPIs. For example, HIV gp120–CD4 and SARS-CoV-2 spike–ACE2 interactions are critical for viral entry, and small-molecule inhibitors can block these interfaces to prevent infection.
- Autoimmune and inflammatory disorders: PPIs play a key role in cytokine signaling (e.g., IL-2/IL-2R, TNF-α trimerization) by mediating the assembly and activation of receptor complexes that propagate immune signals. For this reason, they are attractive targets for modulating immune responses. Disrupting these interactions can attenuate pathological inflammation without broadly suppressing immunity.
By combining the AXXPPI library, designed using Generative AI for PPI modulators, with Axxam’s advanced screening platforms, we offer a powerful solution to accelerate the discovery of novel therapeutics targeting challenging protein–protein interactions.
Key concepts: transfer learning and multi-objective optimization
The technologies behind the AXXPPI library — such as transfer learning and reinforcement learning — are central to applying generative AI for PPI modulators. Below, we briefly explain these key concepts to provide additional context for the methods used.
Transfer learning is a machine learning technique in which knowledge gained from training a model on one task is reused to improve performance on a related but distinct task. Instead of training a model from scratch, an existing model — typically trained on a large, general dataset — is fine-tuned using a smaller, domain-specific dataset.
This approach leverages learned representations and accelerates convergence, often leading to improved accuracy and efficiency, especially when data for the target task is limited. Transfer learning is widely used in natural language processing, computer vision, and increasingly in cheminformatics and drug discovery.

Figure 5. Transfer learning scheme.
Drugs are complex entities with multiple properties defining their behavior in vivo and their therapeutic potential. Finding the optimal range for each of these properties is usually challenging due to their numerosity and often conflicting nature. Multi-objective optimization allows researchers to simultaneously consider and balance multiple objectives in parallel. The goal is achieved through a number of reinforcement learning iterations.
For each iteration, the algorithm designs with the aim of improving the score defined as the combination of several calculated molecular properties values. In this way, a set of optimal solutions (Pareto front) representing trade-offs between different objectives are generated, helping identify compounds that strike the best balance across all desired properties. Researchers can then select the designs that best meet their specific needs and priorities for wet lab validation, ultimately saving time and enhancing rational design.

Figure 6. Multi objective (Pareto) optimization. A) Pareto front. B) Reinforcement learning scheme. C) R-group design and selection.
The AXXPPI library was developed as part of the AICoS project (CUP B79J23000340005), supported by the Italian Ministry of Economic Development (MISE) under the Innovation Agreements D.M. 31/12/2021.
Related content

We offer a diverse portfolio of high-quality compound libraries designed to accelerate your drug discovery efforts
