____________________________________________________________

Research Projects    

SAnDReS: Statistical Analysis of Docking Results and Scoring functions

SAnDReS draws inspiration from several protein-ligand projects that we have been working on in the last two decades. These projects began in the 1990s with pioneering studies focused on intermolecular interactions between cyclin-dependent kinase and inhibitors (De Azevedo et al., 1996; 1997). SAnDReS is a free and open-source (GNU General Public License) computational environment for the development of machine-learning models for prediction of ligand-binding affinity. SAnDReS is also a tool for statistical analysis of docking simulations and evaluation of the predictive performance of computational models developed to calculate binding affinity. We have implemented machine learnining techniques to generate regression models based on experimental binding affinity and scoring functions such as PLANTS and MolDock scores. The scikit-learn library has a wide spectrum of supervised machine learning techniques for regression, such as Stochastic Gradient Descent and Support Vector. SAnDReS was developed using Python programming language, and SciPy, NumPy, scikit-learn, and Matplotlib libraries. Data obtained from any protein-ligand docking program can be analyzed by SAnDReS, the only requisite is to have protein structures in Protein Data Bank (PDB) format, ligands in Structure Data Format (SDF), docking and scoring function data in comma separated values (CSV) format. This program has been applied to several datasets comprised of crystallographic structures for which there is information for the ligand-binding affinity, in order to generate scoring functions tailored to the biological system of interest (Xavier et al., 2016).


Gallery of Plots Generated By SAnDReS     


Scatter plots and ROC Curve generated by SAnDReS 


The flowchart below illustrates the main steps to integrate a molecular docking program and SAnDReS.

Flowchart for application of SAnDReS to analyze docking results and develop scoring functions. Grey boxes indicate tasks carried out by SAnDReS.


Below you have a list of biological systems being investigated using SAnDReS.

-3-enol-pyruvoylshikimate-5-phosphate synthase (EPSP synthase) (EC 2.5.1.19)   PubMed            
-11-Beta-Hydroxysteroid Dehydrogenase (EC 1.1.1.146)   PubMed      
-2014 Benchmark Exercise for Coagulation Factor Xa (EC 3. 4.21.6)   PubMed            

-Acetylcholinesterase (EC 3.1.1.7)   PubMed   MOTM   
-ADAM 17 Endopeptidase (EC 3.4.24.86)   PubMed         
-Adenosine A2a Receptor (EC 3.2.1.17)   PubMed          
-Adenosine Deaminase (EC 3.5.4.4)   PubMed          
-Adenosylhomocysteinase (EC 3.3.1.1)   PubMed         
-Aldehyde Reductase (EC 1.1.1.21)   PubMed              
-Angiotensin-Converting Enzyme (EC 3.4.15.1)   PubMed           

-Beta-2 Adrenergic Receptor (EC 3.2.1.17)   PubMed          
-Beta-Glucocerebrosidase (or Glucosylceramidase)  (EC 3.2.1.45)   PubMed      
-Beta-Lactamase (or Cephalosporinase) (EC 3.5.2.6)   PubMed   MOTM    
-Beta-Secretase 1 (or Memapsin 2) (EC 3.4.23.46)   PubMed   MOTM        

-Carbonic Anhydrase II (or Carbonic Dehydratase) (EC 4.2.1.1)   PubMed   MOTM       
-Caspase-3 (or Apopain) (EC 3.4.22.56)   PubMed     MOTM
-Catechol O-Methyltransferase (EC 2.1.1.6)   PubMed  
-Chorismate Synthase (EC 4.2.3.5)   PubMed   
-c-Jun N-Terminal Kinase 3 (or Mitogen-Activated Protein Kinase) (EC 2.7.11.24)   PubMed       
-Coagulation Factor X (or Prothrombase) (EC 3.4.21.6)   PubMed         
-Coagulation Factor VIIa (EC 3.4.21.21)   PubMed           
-C-X-C Chemokine Receptor Type 4 (EC 3.2.1.17)   PubMed         
-Cyclin-Dependent Kinase (EC 2.7.11.22)   PubMed         
-Cyclooxygenase-1 and 2 (EC 1.14.99.1)   PubMed   MOTM   
-Cytochrome P450 2C9 (EC 1.14.13.-)   PubMed   MOTM   

-Dihydrofolate Reductase (EC 1.5.1.3)   PubMed   MOTM
-Dihydroorotate Dehydrogenase (Quinone) (EC 1.3.5.2)   PubMed       
-Dipeptidyl Peptidase IV (EC 3.4.14.5)   PubMed       
-Dopamine D3 Receptor (EC 3.2.1.17)   PubMed         
-Dual Specificity Mitogen-Activated Protein Kinase Kinase 1 (EC 2.7.12.2)   PubMed           

-Enoyl-[Acyl-Carrier-Protein] Reductase (NADH) (EC 1.3.1.9)   PubMed          
-Epidermal Growth Factor Receptor ErbB1 (EC 2.7.10.1)   PubMed   MOTM  

-Farnesyl Diphosphate Synthase (EC 2.5.1.10)   PubMed             
-Fibroblast Growth Factor Receptor 1 (EC 2.7.10.1)   PubMed           
-FK506-Binding Protein 1 (EC 5.2.1.8)   PubMed           
-Focal Adhesion Kinase 1 (EC 2.7.10.2)   PubMed           

-GAR Transformylase (EC 2.1.2.2)   PubMed          
-Glucocorticoid Receptor (EC 2.3.1.48)   PubMed         

-Hepatocyte Growth Factor Receptor (or Receptor Protein-Tyrosine Kinase) (EC 2.7.10.1)   PubMed              
-Hexokinase Type IV (or Glucokinase) (EC 2.7.1.2)   PubMed           
-High-Resolution Crystallographic Structures with Delta G Information       
-High-Resolution Crystallographic Structures with Kd Information    
-High-Resolution Crystallographic Structures with Ki Information       
-High-Resolution Crystallographic Structures with IC50 Information    
-Histone Deacetylase 2 (or HDAC) (EC 3.5.1.98)   PubMed         
-Histone Deacetylase 8 (or HDAC) (EC 3.5.1.98)   PubMed        
-HMG-CoA Reductase (or Hydroxymethylglutaryl-CoA Reductase (NADPH) ) (EC 1.1.1.34)   PubMed         
-Human Immunodeficiency Virus Type 1 Integrase (EC 2.7.7.-)   PubMed   MOTM   
-Human Immunodeficiency Virus Type 1 Protease (EC 3.4.23.16)   PubMed   MOTM              
-Human Immunodeficiency Virus Type 1 Reverse Transcriptase (EC 2.7.7.49)   PubMed  MOTM         
-Hydrolases (EC 3.-.-.-)   PubMed        

-Insulin-Like Growth Factor I Receptor (or Receptor Protein-Tyrosine Kinase) (EC 2.7.10.1)   PubMed        
-Inhibitor of Apoptosis Protein (or Magnesium-Importing ATPase) (EC 3.6.3.2)   PubMed       
-Isomerases (EC 5.-.-.-)   PubMed          
 
-Kinases   PubMed         

-Leukotriene-A(4) Hydrolase (or LTA-4 Hydrolase) (EC 3.3.2.6)   PubMed        
-Ligases (EC 6.-.-.-)   PubMed   MOTM             
-Lyases (EC 4.-.-.-)   PubMed        
      
-Macrophage Colony Stimulating Factor Receptor Macrophage Colony Stimulating Factor Receptor (EC 2.7.10.1)   PubMed       
-MAP Kinase-Activated Protein Kinase 2 (or Non-Specific Serine/Threonine Protein Kinase) (EC 2.7.11.1)   PubMed       
-MAP Kinase ERK2 (or Mitogen-Activated Protein Kinase) (EC 2.7.11.24)   PubMed        
-MAP Kinase p38 Alpha  (or Mitogen-Activated Protein Kinase) (EC 2.7.11.24)   PubMed      
-Matrix Metalloproteinase 13 (EC 3.4.24.-)   PubMed       
-Monoamine Oxidase B (or Monoamine Oxidase) (EC 1.4.3.4)   PubMed       
-Muscle Glycogen Phosphorylase (or Amylophosphorylase) (EC 2.4.1.1)   PubMed   MOTM         

-Nitric-Oxide Synthase (EC 1.14.13.39)   PubMed   MOTM         
-Neuraminidase (EC 3.2.1.18)   PubMed   MOTM   

-Oxidoreductases (EC 1.-.-.-)   PubMed         

-Peptide Deformylase (EC 3.5.1.88)   PubMed            
-Peroxisome Proliferator-Activated Receptor Alpha (EC 2.3.1.48)   PubMed     
-Peroxisome Proliferator-Activated Receptor Gamma (EC 2.3.1.48)   PubMed 
-Phosphodiesterase 5A (EC 3.1.4.35)   PubMed         
-Phospholipase A(2) Group IIA (EC 3.1.1.4)   PubMed        
-Poly [ADP-ribose] Polymerase-1 (or NAD(+) ADP-ribosyltransferase) (EC 2.4.2.30)   PubMed            
-Protein Farnesyltransferase/Geranylgeranyltransferase Type I Alpha Subunit (EC 2.5.1.58 or EC 2.5.1.59)   PubMed        
-Protein Kinase C Beta (EC 2.7.11.13)   PubMed            
-Purine Nucleoside Phophorylase (EC 2.4.2.1)   PubMed         

-Renin (or Angiotensin-Forming Enzyme (EC 3.4.23.15)   PubMed    
-Rho-Associated Protein Kinase 1 (EC 2.7.11.1)   PubMed    
-Serine/Threonine-Protein Kinase AKT (EC 2.7.11.1)   PubMed     
-Serine/Threonine-Protein Kinase AKT2 (EC 2.7.11.1)   PubMed    
-Serine/Threonine-Protein Kinase B-raf (EC 2.7.11.1)   PubMed   MOTM     
-Serine/Threonine-Protein Kinase PLK1 (or Polo Kinase) (EC 2.7.11.21)   PubMed     
-Serine/Threonine-Protein Kinase WEE1 (EC 2.7.10.2)   PubMed     
-Serotonin Receptor   PubMed   MOTM    

-Shikimate Kinase (EC 2.7.1.71)   PubMed   
-Stem Cell Growth Factor Receptor (or Receptor Protein-Tyrosine Kinase) (EC 2.7.10.1)   PubMed     

-TGF-Beta Receptor Type I (or Receptor Protein Serine/Threonine Kinase (EC 2.7.11.30)   PubMed     
-Thymidine Kinase (EC 2.7.1.21)   PubMed    
-Thymidylate Synthase (EC 2.1.1.45)   PubMed    
-Thrombin (or Fibrinogenase) (EC 3.4.21.5)   PubMed     
-Transferases (EC 2.-.-.-)   PubMed     
-Trypsin I (or Alpha-Trypsin or Beta-Trypsin) (EC 3.4.21.4)   PubMed      
-Tryptase Beta-1 (EC 3.4.21.59)   PubMed        
-Tyrosine-Protein Kinase ABL (or Non-Specific Protein-Tyrosine Kinase) (EC 2.7.10.2)   PubMed       
-Tyrosine-Protein Kinase JAK2 (or Non-Specific Protein-Tyrosine Kinase) (EC 2.7.10.2)   PubMed 
-Tyrosine-Protein Kinase LCK (or Non-Specific Protein-Tyrosine Kinase) (EC 2.7.10.2)   PubMed  
-Tyrosine-Protein Kinase SRC (or Non-Specific Protein-Tyrosine Kinase) (EC 2.7.10.2)   PubMed  MOTM  

-Urokinase-Type Plasminogen Activator (or U-plasminogen Activator) (EC 3.4.21.73)   PubMed    

-Vascular Endothelial Growth Factor Receptor 2 (or Receptor Protein-Tyrosine Kinase) (EC 2.7.10.1)   PubMed  


Related Links  

     -A Database of Useful Decoys: Enhanced (DUDE)     
     -Enzyme Nomenclature Database (Expasy)     
     -Scikit-learn Machine Learning Techniques for Regression   
     -Matplotlib     
     -NumPy     
     -Protein Data Bank (PDB)     
     -Python      
     -SAnDReS       
     -SciPy      
     -UCI Machine Learning Repository   
     -Wolfram Demonstration Projects for Machine Learning   
     -Wolfram Demonstration Projects for Regression     
     -Wolfram Demonstration Projects for Stochastic Gradient Descent          
 

Evolutionary Algorithms Applied to the Study of Intermolecular Interactions  

The present research project aims to study protein-ligand interactions through application of evolutionary algorithms and empirical scoring functions. It will be used structural information available at Protein Data Bank (PDB) and published binding affinity as well, in order to obtain training sets for empirical scoring functions to predict binding affinity. These scoring functions will be tuned using available binding-affinity data, sorted by enzymatic classes, which allows these functions to be specific for the molecular system to be simulated. In this way, we hope to give our humble contribution to understanding of intermolecular interactions present in protein and ligands, a pivotal topic for computer-based drug design.    


Keywords: Protein, binding affinity, drug design, evolutionary algorithms, differential evolution


Funding: R$ 120.000,00 (one-hundred and twenty thousand reais)
Funding Agency: Conselho Nacional de Desenvolvimento Científico e Tecnológico - National Counsel of Technological and Scientific Development (www.cnpq.br)
Period: From March/2015 to February/2019.
Principal Investigator : Walter F. de Azevedo Jr., Ph.D  
Process Number: 308883/2014-4