Frontiers Section Editor (Bioinformatics and Biophysics) for the Current Drug Targets ISSN: 1873-5592


Section Editor (Bioinformatics in Drug Design and Discovery) for the Current Medicinal Chemistry ISSN: 1875-533X


Section Editor (Combinatorial/Medicinal Chemistry) for the Combinatorial Chemistry & High Throughput Screening ISSN: 1875-5402


Member of the Editorial Board for the Current Bioinformatics ISSN: 2212-392X (Online) ISSN: 1574-8936 (Print)


Member of the Editorial Board for the Organic & Medicinal Chemistry International Journal ISSN: 2474-7610


Section Editor in Chief (Bioinformatics) for Bioengineering International. ISSN 2668-7119




Please cite the following reference (da Silva AD et al., 2020) if the Taba program was useful.        

da Silva AD, Bitencourt-Ferreira G, de Azevedo WF Jr. Taba: A Tool to Analyze the Binding Affinity. J Comput Chem. 2020; 41(1): 69-73. doi: 10.1002/jcc.26048.   PubMed   Publons   

How to Install 

You need to have Python 3 installed on your computer to run Taba. In addition, you also need NumPy (1.14.5*), Matplotlib, scikit-learn (0.19.1*), PyQt4 and SciPy (1.1.0*).
*You can use higher versions as well.

  • Step 1. Download Taba (available here)
  • Step 2. Unzip the zipped file TABA_dist
  • Step 3. Copy TABA_dist directory to c:\ 
  • Step 4. Open a command prompt window and type: cd c:\TABA_dist 
  • Step 5. Then type: python 

This launches a GUI window for Taba. That´s it, good Taba session. See help for additional information about how to run Taba. 

  • Step 1. Download Taba (available here)
  • Step 2. Unzip the zipped file TABA_dist
  • Step 3. Copy TABA_dist directory to the directory of your choice
  • Step 4. Open a terminal and type cd /your personal directory/TABA_dist
  • Step 5. Then type: python

This launches a GUI window for Taba. That´s it, good Taba session. See help for additional information about how to run Taba.  


The basic idea behind the Taba is that the determinant structural features responsible for ligand-binding affinity are already somehow imprinted in the three-dimensional structures of protein-ligand complexes. When we consider an ensemble of crystallographic structures, for which ligand-binding information data is available, we have the raw data that can be used by the program Taba to generate a target-based polynomial scoring function. To build this target-based polynomial scoring function (flowchart below), Taba reads all structures available for a biological system of interest and calculates the average distances for each type of pair of atoms. For instance, consider intermolecular Carbon-Carbon distances, where one Carbon belongs to the protein and the second one is in the ligand. Taba calculates the average intermolecular distance for Carbon-Carbon pair. Taba considers this length as the equilibrium distance for a Carbon-Carbon pair, taking an analogy with a mass-spring system. For a given structure, displacement from this equilibrium distance generates an increase in the energy of the system. Again, we consider this naïve analogy with the mass-spring system. 

Flowchart showing the main steps used to generate targeted-scoring functions with Taba (da Silva AD et al., 2020). 

For each pair of atoms, Taba calculates the average intermolecular distances. These distances are considered the equilibrium distance for each pair of atoms. We have an equilibrium distance for Carbon-Carbon pair, another for Carbon-Oxygen pair, and so on. The animated figure below shows the oscillation of a mass-spring system, displacement from the equilibrium generates a restoring force that causes the system to move in the contrary direction, in a harmonic motion.

Mass-spring system in an undamped oscillation movement (the program Mathematica generated the above animation, the code is available here).

As we previously highlighted, to apply Taba we need to have an ensemble of crystallographic structures for which ligand-binding affinity is known. This set of structures is used to train our model. In the first round, Taba calculates the average distance for each pair of atoms. On a second round, Taba applies supervised machine learning techniques to determine the relative weights of each type of pair of atoms. Taba considers intermolecular distances for each pair of atoms as explanatory variables. The response variable is the log of binding affinity, for instance, log(Ki), where Ki is the inhibition constant. Taba considers the following atoms from the protein structure: C, N, O, S, and P. For the ligands, Tabas uses the following atoms: C, N, O, S, F, Cl, Br, I, and P.

The Experiment

For the use of the Taba, we adopt a specific concept of experiment. For the Taba, the experiment is a set of files in the Protein Data Bank (PDB) format (Berman et al., 2000), data with PDB access codes, ligand-binding information file, configuration file, transformed files for regression, and resulting files. In this way, when we refer to an experiment, we are seeing to a set of data generated for a set of PDBs of a particular protein family and their associated records. Every experiment has a specific folder with the name given by the user.

The Tool

The Taba has of the main screen, where you can select the desired task and six other screens with various functionalities. In addition to the feature screens, we have a screen with help and another overall information about the Taba. To run an experiment, you should follow the order in which the buttons are on the main screen, from left to right.

Main Features of the Taba

Experiment Management: Before starting any experiment, you will need to trigger this functionality that allows us to save the current experiment, open an existing experiment or even delete the current experiment. When erasing an experiment, check the need to save it first.

Downloading PDB files: This feature allows the user to download data from the site (Berman et al., 2000). Taba can download two types of files: the PDB file with the atomic coordinates and the second one with the binding-affinity information. This binding affinity can be the inhibition constant (Ki), half-maximal inhibitory concentration (IC50), half-maximal effective concentration (EC50), and dissociation constant (Kd). 
The codes obtained on the site, following user-defined search criteria, must be pasted into the appropriate box on the download screen. Before this, the user must use the cleaning button to clear the code field and also the name of the experiment. After pasting the PDB file codes, the user must fill in the field with the name of the experiment using the save option. Then you can select the download button. When the physical progress bar is 100%, you may close this screen. Always when the download screen opens, the PDB codes of the current experiment will be loaded.

Generate files for regression: This functionality is essential for the generation of machine-learning models. Taba makes use of the scikit-learn library to implement supervised-machine learning techniques (Pedregosa et al., 2011). The user can select the maximum distance. Taba will consider the intermolecular distance between an atom of a ligand and the protein. The allowed values ​​in Angstroms are the following: 3.5, 4.5, 6.0, 7.5, and 9 Å. This feature will randomly generate two file sets, one for training and another for testing. For this, the user can select the seed that will generate this randomness. For each dataset (training and test) four files will be generated to be selected later for regression. Taba uses the binding information from three other databases: PDBbind (Wang et al., 2004), BindingDB (Liu et al., 2007), and Binding MOAD (Hu et al., 2005). The fourth file type groups these three together.


Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000; 28(1): 235-242.   PubMed   

da Silva AD, Bitencourt-Ferreira G, de Azevedo WF Jr. Taba: A Tool to Analyze the Binding Affinity. J Comput Chem. 2020; 41(1): 69-73.  PubMed   

de Azevedo WF, Leclerc S, Meijer L, Havlicek L, Strnad M, Kim SH. Inhibition of cyclin-dependent kinases by purine analogues: crystal structure of human cdk2 complexed with roscovitine. Eur J Biochem. 1997; 243(1-2): 518-526.   PubMed   

Hu L, Benson ML, Smith RD, Lerner MG, Carlson HA. Binding MOAD (Mother Of All Databases). Proteins: Struct Funct Genet. 2005; 60(3): 333-340.   PubMed   

Liu T, Lin Y, Wen X, Jorrisen RN, Gilson MK. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res. 2007; 35 (Database issue): D198-201.   PubMed  

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Verplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011; 12: 2825-2830.   PDF    

Wang R, Fang X, Lu Y, Wang S. The PDBbind Database: Collection of Binding Affinities for Protein-Ligand Complexes with Known Three-Dimensional Structures. J. Med. Chem. 2004; 47(12): 2977-2980.   PubMed