Just like in the above studies, we focused our attention on protein kinases

Just like in the above studies, we focused our attention on protein kinases. affected by binding pose accuracy. To that end, we create a validation set that consists strictly of proteins whose crystal structure was solved in Theobromine (3,7-Dimethylxanthine) complex with their inhibitors. For the rank-ordering studies, we use crystal structures from PDBbind along with corresponding binding affinity data provided in the database. In addition to binding pose, we investigate the effect of using modeled structures for the target around the enrichment performance of SVMGen and GlideScore. To accomplish this, we generated homology models for protein kinases in DUD-E for which crystal structures are available to enable comparison of enrichment between modeled and crystal structure. We also generate homology models for kinases in SARfari for which there are many known small-molecule inhibitors but no known crystal structure. These models are used to assess the ability of SVMGen and GlideScore to distinguish between actives and decoys. We focus our work on protein kinases considering the wealth of structural and binding affinity data that exists for this family of proteins. Graphical abstract INTRODUCTION Structure-based virtual screening is commonly used to enrich chemical libraries to identify active compounds that can serve as tools in chemical biology or as leads for drug discovery.1 A library of small molecules is first docked to a binding site around the structure of a protein followed by the re-scoring and rank-ordering of the resulting protein-compound structures in a process known as scoring. Several docking methods have been implemented in widely-used computer programs such as AutoDock,2, 3 Glide,4, 5 and Gold.6 Algorithms and scoring methods to predict the binding mode of small molecules have matured significantly, but there is a need for better scoring methods to rank-order protein-compound structures.7 The performance of scoring methods is often target-specific. This has led to a constant need to develop better scoring methods. Several scoring approaches have been developed ranging from empirical,5, 8 force field,6, 9 and knowledge-based.10, 11 Increasingly, scoring methods are using machine learning techniques to improve database enrichment and rank-ordering.12, 13 The performance of scoring approaches in enriching compound libraries is often explored using validation sets such as DUD-E,14 DEKOIS,15 and Theobromine (3,7-Dimethylxanthine) others.16, 17 These datasets provide a set of actives and matching decoys that are used to test the ability of scoring methods to distinguish actives from decoys. Both actives and decoys are docked to their corresponding target, and the resulting complexes are re-scored. Performance is evaluated using enrichment or receiver operating characteristic (ROC) plots. One limitation of these datasets is that there is generally no crystal structure of the active compounds bound to their corresponding targets. Molecular docking is used to predict the binding mode of active compounds. Considering that docking results in high-quality binding modes in only a fraction of binding sites, it is difficult to determine whether limitations in re-scoring methods are due to lack of accuracy in the binding Theobromine (3,7-Dimethylxanthine) mode, or inherent limitations in the re-scoring method. The lack of accuracy in docking Rabbit Polyclonal to SPTBN5 can also impact the re-scoring of compounds during virtual screening. Ideally, a re-scoring method should favor compounds with correct binding poses. Despite the exponentially-growing list of crystal structures, a majority of proteins of the human proteome have yet to be solved. For example, among the 518 kinases of the human kinome, less than half have been solved by crystallography. This poses a significant impediment to the rational design of selective small-molecule kinase inhibitors. Recent studies have shown that even FDA-approved drugs often have a large number of additional targets. 18C20 These off-targets may be responsible for the failure of the majority of kinase inhibitors in the clinic, despite the often overwhelming evidence to support a role of their target in the disease of interest. To address this limitation, recent efforts have concentrated on building homology models for all those unsolved kinases of the human kinome.21 A question of interest is how these modeled structures affect scoring and re-scoring performance during virtual screening. Understanding how homology models affect rank-ordering could help to develop better ranking methods for these modeled structures. This will enable the use of all structures of a protein family during virtual screening, which could enhance our ability to identify selective kinase ATP-competitive inhibitors and reduce the failure of drugs in the clinic. Recently, we introduced an innovative approach for re-scoring protein-compound structures. The method combines knowledge-based potentials with machine learning.22 We called the scoring method SVMSP to highlight the fact that information from the target of interest is used to derive the scoring function. The approach Theobromine (3,7-Dimethylxanthine) consisted of training Support Vector Machine (SVM) using knowledge-based potentials as features. These potentials were decided using three-dimensional co-crystal structures from the Protein Databank (PDB) for.