Compound Libraries Archivi

AXXVirtual: a chemistry-driven virtual library for drug discovery

Science Spyglass Building AXXVirtual: a chemistry-driven virtual library for drug discovery In the same way as in high-throughput screening (HTS), the quality of the screened library plays a crucial role in the success of virtual screening. While virtual screening enables the exploration of much broader and more diverse chemical spaces, many virtual libraries are populated with molecules that, while computationally attractive, are difficult — or even impossible — to synthesize. For many drug discovery programs, this stage represents a major bottleneck: synthesis can be slow, unpredictable, and resource-intensive, delaying the confirmation of biological activity and the chemical exploration of the promising compounds. This is precisely the gap that the AXXVirtual library was designed to overcome. Beyond ensuring high-quality, drug-like chemical space, this 185 million non-commercial small molecule library was built with synthesis feasibility at its core. Every AXXVirtual compound can be produced in just 2–3 steps from readily available building blocks. This unique design guarantees that virtual hits are not just theoretical possibilities but tangible molecules, accessible within controlled and predictable timelines. As a result, AXXVirtual enables researchers to reach in vitro confirmation faster, accelerating the path from virtual screening to validated hits. Developed through a structured four-stage process, the compounds were rigorously selected by applying strict rules and filters to guarantee drug-like properties and structural diversity, thereby enabling efficient downstream development. This article walks you through the principles behind building a high-quality virtual library for drug discovery and shows how these concepts were applied in the design of the 185 million compound AXXVirtual library. Want to explore AXXVirtual for your projects? Get in touch Designing for the real lab: the synthetic accessibility Synthetic feasibility has become a key parameter in the design of virtual libraries, ensuring that computational efforts translate into compounds that can be readily produced for downstream testing [1, 2]. The core of this approach lies in relying on synthetic routes based on established reaction classes that have demonstrated their value over decades — including, for instance, amide coupling and the Suzuki–Miyaura reaction. Despite the emergence of new methodologies, these reactions remain the backbone of medicinal chemistry due to their efficiency, reproducibility, scalability, and high yields [3]. Built on the concepts previously described, the AXXVirtual compounds have been designed to be synthesized through twelve synthetic routes, each consisting of two to three steps, employing nine reliable reactions. The building blocks, more than 12.000 in total, were selected from the inventory of a trusted partner and are immediately available, eliminating delays from external orders. In addition, the reagents have been carefully selected to ensure clean reactions, minimizing side products and regioisomer formation. This thoughtful combination of proven chemistry and readily available reagents enables a fast and efficient synthesis, allowing the preparation of 100-120 compounds within just two to three weeks. To maintain these standards, the library is regularly updated in line with the partner’s inventory, making AXXVirtual a dynamic and continuously evolving library. Designing smarter: AI-powered properties and synthetic feasibility prediction Artificial intelligence (AI) is playing an increasingly important role in the landscape of virtual libraries for drug discovery by enabling more accurate and efficient predictions of molecular properties and synthetic accessibility. Today, a variety of machine learning models are employed to predict molecular properties with increasing accuracy and speed. These models rely heavily on large and curated training sets – databases of molecules with known experimental properties – to identify patterns and relationships between molecular features, such as size, chemical groups, and shape, and their observed behaviors, such as solubility and toxicity. Unlike simple rule-based methods, machine learning adapts to the complexity and the variability inherent in chemical data and this allows it to capture subtle influences and nonlinear effects that traditional rules often miss. For synthetic accessibility, tools like RAscore (Retrosynthetic Accessibility Score) [8] are widely used. RAscore is a machine learning classifier trained on the outcomes of the retrosynthetic planning software AiZynthFinder. Instead of running a full retrosynthetic analysis for each molecule — which is impractical when dealing with millions of compounds — RAscore provides a rapid estimate of whether a compound is likely to be synthesizable using known building blocks and reaction rules. Applying RAscore to evaluate AXXVirtual compounds, we found that the vast majority (96%) scored above 0.8 on the 0-to-1 scale, confirming their high synthetic accessibility. This result further highlights the robustness of the chemistry underpinning our library. Designing for success: from synthesizable to developable molecules While synthetic accessibility defines what can be built, drug-likeness defines what is worth pursuing. Virtual libraries should not only contain compounds that are synthetically feasible, but also exhibit molecular properties that make them suitable candidates for future development. This includes properties that impact solubility, permeability, metabolic stability, and safety. The concept of drug-likeness is grounded in the empirical observation of properties shared by orally bioavailable drugs. Large-scale analyses of marketed drugs and clinical candidates have revealed that certain molecular properties – such as moderate size and balanced lipophilicity – are associated with favorable pharmacokinetic behavior. These findings led to the formulation of guidelines, with Lipinski’s Rule of Five (Ro5) [4] and Veber’s rules [5] being among the most well-known and widely adopted. In parallel with physicochemical profiling, the quality of chemical libraries, including virtual ones, must be ensured by excluding compounds known to cause assay interference or unreliable readouts. A major class of such problematic molecules is represented by PAINS (Pan-Assay Interference compoundS), which are chemical structures prone to react nonspecifically with numerous biological targets rather than specifically affecting one desired target [6]. Rhodanines exemplify the extent of the problem. More than 2.000 rhodanines have been reported to have biological activity in over 400 papers. However, a publication by Bristol-Myers Squibb points out that these compounds undergo light-induced reactions that irreversibly modify proteins. It is hard to imagine how such a mechanism could be optimized to produce a drug or a useful tool [7]. At Axxam, more than 20 years of experience with physical libraries have given us deep insight into selecting the right compounds

Designing PPI modulators with Generative AI

Leave a Comment / Blog, Compound Libraries / ASolia

Science Spyglass Designing a library of PPI modulators with Generative Artificial Intelligence Why Generative AI for PPI Modulators? Protein–protein interactions (PPIs) play a central role in the regulation of cellular signaling, structural organization, and enzymatic function. Dysregulation of specific PPIs has been implicated in a wide range of pathologies, including oncogenesis, neurodegenerative disorders, and infectious diseases. Despite their biological relevance, PPIs have long posed a challenge for small-molecule drug discovery due to their typically large, shallow, and hydrophobic binding interfaces that often involve dispersed hot spots making it difficult to achieve high-affinity and selective binding with conventional small molecules. However, advances in structural biology and computational modeling have begun to redefine the tractability of PPIs. Among the most promising developments in this space is the application of generative artificial intelligence (Generative AI), which offers powerful new capabilities to explore chemical space and maximize the chance of binding at protein-protein interfaces. Figure 1. Example of protein-protein interaction (PPI) molecules binding at the interface of two protein units. Below we describe our application of Generative AI in designing a library of PPI modulators, called the AXXPPI library. Furthermore, important AI concepts are briefly described at the end of the article. Ready to accelerate your drug discovery with the AXXPPI library? Contact us Data curation for PPI modulator design High-quality data is foundational to the success of any machine learning–driven drug discovery project, and this is especially true when targeting complex systems like protein–protein interactions. Accurate, well-curated datasets enable models to learn meaningful structure–activity relationships and generate chemically plausible, target-relevant molecules. For this project, we assembled a diverse and representative dataset of known PPI modulators by integrating compounds from several specialized sources, including 2P2Idb, iPPI-DB, Timbal, ChEMBL, and peer-reviewed literature. Each entry was carefully curated to ensure correct annotation of bioactivity, target interface, and molecular structure, enabling downstream modeling to focus on truly relevant chemical features. This rigorous data preparation step is essential not only for training robust models but also for minimizing bias and improving the translational relevance of the predicted molecules. Finetuning a pretrained drug-like model To explore the chemical space of PPI modulators, we employed de novo molecular generation using state-of-the-art generative modeling frameworks. As a starting point, we utilized a model pretrained on large, drug-like chemical libraries, which had learned the general rules of chemical syntax and structure. Figure 2. Schematic representation of ideal molecules (yellow spheres) within focused (light grey) and general (dark grey) chemical space. To adapt this model to PPI-relevant scaffolds and molecular features, we applied a transfer learning strategy by fine-tuning it on our curated dataset of known PPI modulators. This domain-specific refinement enabled the generative algorithm to focus on structural motifs, physicochemical properties, and topologies commonly found in PPI-active compounds within the drug-like space. Nevertheless, further optimization of the fine-tuned model is required to directly incorporate additional design constraints such as synthetic tractability without compromising its ability to generate functionally relevant modulators with promising pharmacological profiles. De novo generation of optimized molecules To further guide the de novo generation of PPI modulators toward synthetically tractable and functionally relevant chemical space, we implemented reinforcement learning (RL) with multi-objective optimization. A key component of our reward strategy was a machine learning classifier specifically trained to distinguish PPI modulators from non-PPI compounds, providing a probabilistic measure of PPI-likeness for generated molecules. In addition to this classifier, we incorporated the Synthetic Accessibility Score (SAS) to penalize impractical structures. The reinforcement learning algorithm ranked generated molecules based on Pareto efficiency across multiple objectives, ensuring a balance between synthetic tractability and predicted biological relevance. In this framework, the pretrained generative model served as the agent, i.e., the network being actively optimized, while the fine-tuned model acted as the prior, i.e. a non-trainable reference that contributes mutations and maintains the generation of chemically valid and PPI-relevant scaffolds. This setup allowed us to efficiently navigate the multi-dimensional objective space and converge on high-quality candidate structures with an optimal trade-off between desirability and diversity. Figure 3. Evaluation of the Machine Learning Classifier’s ability to discriminate known PPIs (not used during the training process – light green) from random compounds (light grey). Filtering, evaluation and synthesis Once molecules are generated, they must undergo a rigorous filtering and prioritization process before synthesis. This workflow follows a funnel-like approach, where large numbers of candidate compounds are progressively narrowed down. Figure 4. Filtering funnel to select a dataset of molecules to be synthesized. Initially, molecules are removed if they violate medicinal chemistry filters such as REOS or contain undesirable reactive groups. Additional filtering ensures that selected compounds are sufficiently distinct from the training set to avoid rediscovery, while also maximizing chemical diversity. Finally, compounds are evaluated for predicted ADMET properties to ensure acceptable pharmacokinetic and safety profiles. Only a small, carefully selected subset of molecules meeting all these criteria proceeds to synthesis and experimental validation. Use of the PPI library in Axxam HTS A focused PPI compound library is a valuable resource for high-throughput screening (HTS) campaigns targeting challenging biological interfaces. Unlike traditional approaches to PPI-focused libraries, which often rely on large and lipophilic chemotypes with limited drug-like properties to disrupt or stabilize PPIs, our strategy was to combine the structural features required for engaging protein–protein interfaces with careful optimization of physicochemical and developability parameters. The result is a collection of compounds that retain the 3D topologies and interaction patterns relevant for PPI modulation, while maintaining chemical tractability and favorable drug-like characteristics. This makes the library uniquely positioned to deliver high-quality starting points for hit discovery campaigns in the PPI space, increasing their chance for success. PPI-focused libraries are particularly valuable in therapeutic areas where dysregulated protein–protein interactions play a central role in pathogenic interactions. Key areas include: Oncology: Numerous cancer pathways are driven by aberrant PPIs, such as p53–MDM2, BCL-2 family interactions, and β-catenin–TCF. These interactions regulate cell cycle progression, apoptosis, and transcription, making them prime targets for small-molecule inhibitors that can restore normal signaling or induce cell death in tumor cells. Neurodegenerative diseases:

Designing PPI modulators with Generative AI Read More »

Compound Libraries

AXXVirtual: a chemistry-driven virtual library for drug discovery

Designing PPI modulators with Generative AI

Find us on:

Newsletter

Axxam S.p.A.

Follow us

Legal

Copyright © 2026 Axxam S.p.A.