A comparable analysis using only the expected persistent interaction features correctly classified only 76/220 (34

A comparable analysis using only the expected persistent interaction features correctly classified only 76/220 (34.5%) of real antibodies as binders but also only misclassified 12/8194 (0.146%) of poorly docked complexes as likely binders. during the simulations. It was found that only the hydrogen bonds where both residues are stabilized in the bound complex are expected to persist and meaningfully contribute to binding between the proteins. In contrast, stabilization was not a requirement for salt bridges and hydrophobic interactions to persist. Still, interactions where both residues are stabilized in the bound complex persist significantly longer and have significantly stronger energies than other interactions. Two hundred and twenty real antibodyprotein complexes and 8194 decoy complexes were KRN2 bromide used to train and test a random forest classifier using the features of expected persistent interactions identified in this study and the macromolecular features of interaction energy (IE), buried surface area (BSA), IE/BSA, and shape complementarity. It was compared to a classifier trained only on the expected persistent interaction features and another trained only on the macromolecular features. Inclusion of the expected persistent interaction features reduced the false positive rate of the classifier by two to fivefold across a range of true positive classification rates. Keywords:antibodies, antigens, binding proteins, molecular dynamics simulation, protein binding == 1. Introduction == Recent years have seen a veritable explosion in the publication of machine learning (ML) protein design methods [1,2,3,4], including for binding proteins [5,6]. This is a natural progression after the great successes that ML models had for the protein structure prediction problem, starting with AlphaFold2 [7] and RoseTTAFold [8]. While MLbased methods can correctly predict the KRN2 bromide structures of most proteins, the majority of MLdesigned binders fail when experimentally tested. The highest KRN2 bromide experimental success rate of such methods that the authors are aware of is 19% as reported for RFdiffusion [5]. While this is an orders of magnitude improvement over prior approaches, it is evident that such methods still predict an abundance of false positive complexes and that computational tools still struggle to distinguish the features of real proteinprotein binding complexes from false ones [9,10,11,12,13,14,15]. A commonality of both AlphaFold2 and RoseTTAFold is that one of the features their algorithms consider is residueresidue distance information (i.e., protein contact maps). Although ESMFold [16] has demonstrated that protein structures can be accurately predicted based on sequence information alone, AlphaFold2 and RoseTTAFold have shown that residueresidue interaction information can contribute to accurate ML protein prediction algorithms. Thus, considering how amino acids interact in protein interfaces is likely important to the successful design of binding proteins. Prior literature has identified common features of protein binding interfaces, including the importance of prestabilized hotspot residues [17,18,19,20,21,22,23,24,25]. Hotspot residues are those that contribute a disproportionately large percentage of a complex’s binding energy and are prestabilized when they have limited degrees of freedom prior to binding. Our own work has demonstrated outcomes relevant to this problem: that consideration of interaction properties rather than amino acid identities improves computational predictions [26]; that the effects of mutations in antibodyprotein interfaces differ from their effects on protein structures [27]; and that consideration of prestabilization information for pairwise interactions in protein binding interfaces can improve the experimental success rate of MLdesigned protein binders [28]. Considering all of these observations, we hypothesized that there are stabilityrelated properties of interactions at proteinprotein interfaces that can help computational methods eliminate false complexes from consideration. Antibodies are the archetypical binding proteins. There is a known directionality to their binding (i.e., antibodies evolve to bind to antigens) and the properties of their structures and modes of binding have been extensively KRN2 bromide studied due to their importance in medicine and widespread use in experimental protocols. Specifically, antibodies bind using six modular loops known as complementarity determining regions (CDRs), with three in each of the two variable domains that compose their binding sites [29]. The understanding of these properties allowed for several experimentally validated algorithms specifically for designing antibodies to be developed [30,31,32,33,34]. Antibodies are a promising system to study for features of interactions that are important to protein binding because interactions in affinity NOV matured antibody interfaces are unambiguously part of a complex evolved for maximum affinity. Molecular dynamics (MD) simulations have been used to study many protein complexes over varying time scales [35,36,37,38,39,40,41,42,43,44,45,46,47,48,49]. Although it is common practice to assess the accuracy KRN2 bromide of computationally predicted protein complexes by comparing them to experimentally determined structures [12,13,15], it is possible that important features of interactions might only be revealed through study of their dynamic behaviors. In this study, short, 5 ns MD simulations of 20 antibodyprotein complexes were conducted to study their behavior around the local minima of their experimental structures and.