Recent Publications  


Editor for the Following Journals


Taba: A Tool to Analyze the Binding Affinity 


The Science
The basic idea behind the Taba is that the determinant structural features responsible for ligand-binding affinity are already somehow imprinted in the three-dimensional structures of protein-ligand complexes. When we consider an ensemble of crystallographic structures, for which ligand-binding information data is available, we have the raw data that can be used by the program Taba to generate a target-based polynomial scoring function. To build this target-based polynomial scoring function, Taba reads all structures available for a biological system of interest and calculates the average distances for each type of pair of atoms. For instance, consider intermolecular Carbon-Carbon distances, where one Carbon belongs to the protein and the second one is in the ligand. Taba calculates the average intermolecular distance for Carbon-Carbon pair. Taba considers this length as the equilibrium distance for a Carbon-Carbon pair, taking an analogy with a mass-spring system. For a given structure, displacement from this equilibrium distance generates an increase in the energy of the system. Again, we consider this naïve analogy with the mass-spring system. We modeled our protein-ligand interactions as illustrated in the figure below.

Protein-ligand as a mass-spring system

Protein-ligand as a mass-spring system. We used the atomic coordinates for the complex CDK2-roscovitine (PDB: 2A4L)(De Azevedo et al., 1997).  

For each pair of atoms, Taba calculates the average intermolecular distances. These distances are considered the equilibrium distance for each pair of atoms. We have an equilibrium distance for Carbon-Carbon pair, another for Carbon-Oxygen pair, and so on. The animated figure below shows the oscillation of a mass-spring system, displacement from the equilibrium generates a restoring force that causes the system to move in the contrary direction, in a harmonic motion.


Mass-spring system in an undamped oscillation movement (the program Mathematica generated the above animation, the code is available here).

As we previously highlighted, to apply Taba we need to have an ensemble of crystallographic structures for which ligand-binding affinity is known. This set of structures is used to train our model. In the first round, Taba calculates the average distance for each pair of atoms. On a second round, Taba applies supervised machine learning techniques to determine the relative weights of each type of pair of atoms. Taba considers intermolecular distances for each pair of atoms as explanatory variables. The response variable is the log of binding affinity, for instance, log(Ki), where Ki is the inhibition constant. Taba considers the following atoms from the protein structure: C, N, O, S, and P. For the ligands, Tabas uses the following atoms: C, N, O, S, F, Cl, Br, I, and P.

The Experiment

For the use of the Taba, we adopt a specific concept of experiment. For the Taba, the experiment is a set of files in the Protein Data Bank (PDB) format (Berman et al., 2000), data with PDB access codes, ligand-binding information file, configuration file, transformed files for regression, and resulting files. In this way, when we refer to an experiment, we are seeing to a set of data generated for a set of PDBs of a particular protein family and their associated records. Every experiment has a specific folder with the name given by the user.

The tool

The Taba has of the main screen, where you can select the desired task and six other screens with various functionalities. In addition to the feature screens, we have a screen with help and another overall information about the Taba. To run an experiment, you should follow the order in which the buttons are on the main screen, from left to right.

The main features of the Taba

Experiment Management: Before starting any experiment, you will need to trigger this functionality that allows us to save the current experiment, open an existing experiment or even delete the current experiment. When erasing an experiment, check the need to save it first.

Downloading PDB files: This feature allows the user to download data from the site (Berman et al., 2000). Taba can download two types of files: the PDB file with the atomic coordinates and the second one with the binding-affinity information. This binding affinity can be the inhibition constant (Ki), half-maximal inhibitory concentration (IC50), half-maximal effective concentration (EC50), and dissociation constant (Kd). 
The codes obtained on the site, following user-defined search criteria, must be pasted into the appropriate box on the download screen. Before this, the user must use the cleaning button to clear the code field and also the name of the experiment. After pasting the PDB file codes, the user must fill in the field with the name of the experiment using the save option. Then you can select the download button. When the physical progress bar is 100%, you may close this screen. Always when the download screen opens, the PDB codes of the current experiment will be loaded.

Generate files for regression: This functionality is essential for the generation of machine-learning models. Taba makes use of the scikit-learn library to implement supervised-machine learning techniques (Pedregosa et al., 2011). The user can select the maximum distance. Taba will consider the intermolecular distance between an atom of a ligand and the protein. The allowed values ​​in Angstroms are the following: 3.5, 4.5, 6.0, 7.5, and 9 Å. This feature will randomly generate two file sets, one for training and another for testing. For this, the user can select the seed that will generate this randomness. For each dataset (training and test) four files will be generated to be selected later for regression. Taba uses the binding information from three other databases: PDBbind (Wang et al., 2004), BindingDB (Liu et al., 2007), and Binding MOAD (Hu et al., 2005). The fourth file type groups these three together.


Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The Protein Data Bank. Nucleic Acids Res., 2000, 28(1), 235-242.   PubMed   

Hu, L.; Benson, M.L.; Smith, R.D.; Lerner, M.G.; Carlson, H.A. Binding MOAD (Mother Of All Databases). Proteins: Struct. Funct. Genet., 2005, 60(3):333-340.   PubMed   

De Azevedo WF, Leclerc S, Meijer L, Havlicek L, Strnad M, Kim SH. Inhibition of cyclin-dependent kinases by purine analogues: crystal structure of human cdk2 complexed with roscovitine. Eur J Biochem. 1997; 243(1-2): 518-526.   PubMed   

Liu, T.; Lin, Y.; Wen, X.; Jorrisen, R.N.; Gilson, M.K. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res., 2007, 35 (Database issue), D198-201.   PubMed  

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Verplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011; 12: 2825-2830.   PDF    

Wang, R.; Fang, X.; Lu, Y.; Wang, S. The PDBbind Database: Collection of Binding Affinities for Protein-Ligand Complexes with Known Three-Dimensional Structures. J. Med. Chem., 2004, 47(12), 2977-2980.   PubMed