Computer-assisted drug design for the development of novel protease inhibitors using molecular docking algorithms
Arben Kojtari
Drexel University Department of Chemistry
Drexel University, Philadelphia, PA 19104
Submitted December 4, 2010

(I) Introduction
(II) Docking Algorithms
(III) Proteases & Their Mechanisms
(IV) Published Docking Results of Protease Inhibitors
(V) Future Interest in Prostate-Specific Antigen
(VI) Conclusions

Structure-based drug design is a recent avenue in medicinal chemistry that is currently being explored as a plausible method of developing novel drug compounds based on a three-dimensional structure of a target protein. The goal is high-throughput analysis of numerous compounds within a chemical library on a specific target using molecular docking in conjunction with other simulations. This is quite contrary to traditional ligand-based design, which involves the syntheses of a chemical library followed by a screening process to prioritize chemicals with highest activity and typically followed by a quantitative structure-activity relationship, or QSAR, to develop new analogs of active compounds, however, this requires a pre-existing library of compounds for the designated target and the screening process is time-consuming (1). Molecular docking has since aided medicinal chemists by alleviating the amount of time spent on screening processes, since docking approaches can cover millions of compounds in a week using a virtual screening process (2). Along with virtual screening, computer-based de novo drug design can also be accomplished to test small molecules' affinities for a given target protein via docking prior to their syntheses, which is both time-saving and cost-effective (3). Recent publications in molecular modeling have included proteases as potential targets using virtual screening methods as a means to identify possible drug candidates within a chemical library as well as de novo drug design for novel compound discovery.

Docking Algorithms
Numerous docking algorithms have been developed since the advent of molecular docking. One of the first docking simulations involved bovine pancreatic trypsin and its inhibitor BPTI when the protein's X-ray crystal structure was available in 1975 (4). The results paved the way for future computer simulations related to biomolecules in the understanding of molecular dynamics by using X-ray structures of proteins. Since then, computer-aided drug screening and design has aided scientists in exploring new possible lead compounds for a specified target. One of the those fundamental simulations that has facilitated drug discovery and design is the concept of molecular docking. The molecular docking computer simulation allows researchers to screen a chemical library for possible active compounds for a specific target in silico. The method is far less expensive and time-consuming as conventional screening and bioassays. It also allows users to design novel ligands de novo as possible leads towards drug discovery and eventually its syntheses.
The docking process is a simple concept that was designed around the concept of a protein-ligand interaction. In the simulation, the target protein is recognized as a receptor and the ligand is identified as a binding molecule. Solvent can be added to the simulation in order to form the final protein-ligand complex and its score is calculated using an algorithm that is defined depending on which docking program is utilized by the user (2). Figure 1, which was originally illustrated by Diller, D.J. and Merz Jr., K.M., represents the process in which how a simple protein-ligand complex is formed. The docking procedure is defined as an interaction of the ligand atoms to hot spots on the binding cavity or surface of the target protein. These hot spots are typically the functional groups of amino acid residues on the protein. For example, for a serine protease such as trypsin, SER195 residue would most likely be a hotspot within the active site of the protein with the hydroxyl functional group being specified. After the hotspots are identified and the final protein-ligand complex is formed using gradient-based optimization, the ligand can be translated, its bonds rotated, and its orientation adjusted. These are typically defined as ligand poses by the docking programs. The proteins themselves are usually fixed in a position and the global optimum is calculated and determined. Each individual pose of the ligand is scored and recorded by the program, although scoring is varied depending on which docking program and its respective algorithm is utilized by the user. It should be noted that the binding site is usually identified and the ligand is restricted within a box or sphere of the binding site, which encompasses both the interaction location on the protein and the ligand (5).

Figure 1 Schematic representation of the step-by-step formation of the protein-ligand complex (2)

Docking simulations have been categorized into two classes as either combinatorial or stochastic. The combinatorial method uses an incremental construction of the ligand fragments after an anchor is docked within the binding site of the protein. The anchor is held rigid within the site and the fragments of the ligand are implemented. This process is described as an "anchor-and-grow" system which the fragments can either be held rigid or flexed if desired, such as rotating bonds after implementation of the fragments (5, 6). The stochastic docking algorithms differ from the combinatorial method in that the ligand itself is kept as a whole and changes are introduced to the ligand by varying translation of the ligand, rotation of bonds, and adjusting torsion angles (5, 7) . The global energy minimum is calculated after the adjustments to the ligand. Genetic algorithms can be followed in order to swap certain portions of the original ligand to create a diverse class of compounds (5). Each method described here has its advantages and flaws with respect to the docking problem; the combinatorial docking algorithm is less accurate than current stochastic methods and will sometimes be trapped within a local energy minimum rather than a global energy minimum, but is optimal for screening methods within a chemical database since the calculations can be accomplished at a faster rate compared to stochastic docking algorithms (2).
Combinatorial docking algorithms include DOCK, FlexX, HammerHead and Structure Based Focusing. Stochastic docking algorithms include AutoDock, GOLD, TABU, LigandFit, Glide and SAS (2, 8). Each one of these docking programs has its own scoring method to rank poses of the ligands with the receptors. Some of these docking simulations will utilize a forcefield along with their scoring algorithm. GOLD Dock allows application of the CHARMm forcefield on the receptor once binding sphere coordinates and radius are inputted. Others, such as LibDock, does not require an application of a forcefield but the docking site must be defined (8). In short, the docking algorithms each score ligand poses using a stochastic or combinatorial approach along with other unique aspects and/or subsequent algorithms that vary their accuracies and performances in predicting protein-ligand complexes.

Proteases & Their Mechanisms
Enzymes that exhibit proteolytic activity have been targeted as potential sites of inhibition to attack certain diseases and infections. By inhibiting these sites, certain life-threatening disease states can be reversed or reduction of symptoms. Two proteases that have been studied in treating infectious diseases are severe acute respiratory syndrome- associated coronavirus, or SARS CoV, and human immunodeficiency virus (HIV).
SARS CoV is characterized as a deadly form of pneumonia with symptoms including high fever, myalgia, lymphopenia, chills, cough, build-up of infiltrates in the lungs and is easily transmitted from person to person (14). SARS CoV was identified as being highly contagious and worries of a pandemic spread across the populations of continents since it appeared to be a novel type of coronavirus (15). One target studied to aid the treatment of SARS CoV was its viral cysteine protease, which is part of the trypsin-like serine protease family of proteolytic enzymes (16). One of the first crystal structures of the protease was determined by Haitao Yang and colleagues in Tsinghua University and National Laboratory of Biomacromolecules in Beijing, China (17). The protein is 33.8 kDa in molecular weight and is sometimes referred to as a 3C-like protease. Figure 2 shows an illustration of the SARS CoV protease using X-ray crystallographic data.

Figure 2 SARS CoV cysteine protease. Both a general illustration of the domains and the X-ray crystallographic data are shown with the bound peptide inhibitor (17).

The mechanism of serine proteases in biological systems is well understood with respect to the catalytic triad; Histidine 57, Aspartic Acid 102, and Serine 195 (10, 18). SER195's hydroxyl group acts as a nucleophile and attacks the carbonyl of a residue of the peptide bond. The specific residue is dependent on which protease is being studied. The nitrogen lone pair on the HIS57 ring acts to accept the hydrogen of the hydroxyl group SER195 to facilitate the nucleophilic attack on the carbonyl. One pair of electrons from the CO double bond is pushed onto the oxygen, leaving it with 3 lone pairs of electrons and a formal charge of -1. This is described as the tetrahedral intermediate of the process. Once a lone pair from the oxygen is pushed back to form the CO double bond, the peptide bond is broken and the amine group from the peptide picks off the proton from the protonated HIS57 residue. In order to break the enzyme-acyl intermediate between the peptide substrate and SER195, a water molecule acts as a nucleophile to attack the carbonyl once again with the nitrogen atom from the histidine residue stabilizing the result by hydrogen bonding with a proton from the water molecule. A pair of electron from the CO double bond is once again pushed towards the oxygen atom due to this nucleophilic attack. A pair of the electrons from oxygen are pushed back to reform the CO double bond and break the enzyme-substrate bond between the peptide and SER195. The hydroxyl group of SER195 is formed after the oxygen from this residue attacks the proton from the protonated HIS57 group. ASP102 acts to facilitate the activity of HIS57 during this process. It should also be noted that GLY193 helps form the oxyanion hole to the lower the activation energy of the reaction (18).
HIV-1 protease has also been extensively studied as an attractive target for designing a small molecule inhibitor to attack the virus after infection and in the case of viral resistance to current inhibitors. The increased research focus on designing novel HIV inhibitors has been due to
worry of an AIDS epidemic spread (19). The HIV virus is able to use human cellular machinery to an aspartyl protease that plays an important role in the virus's progression. The HIV aspartyl protease is different than proteases transcribed in humans, specifically that it is a homodimer and requires both subunits to form the active site of the protein (20). Numerous published X-ray crystal structures are available on the protein data bank web site, including a structure of the Saquinavir inhibitor bound to the HIV-1 protease published by Yunfeng Tie and colleagues (21). Figure 3 shows an illustration of the HIV-1 protease with the drug Saquinavir bound to its target.

Figure 3 Saquinavir bound to active HIV-1 aspartyl protease viewed using JMOL (21).

Published Docking Results of Protease Inhibitors
Significant developments have been made using molecular modeling as a means to identifying possible lead compounds for inhibition of viral proteases in treating the progression of infectious diseases. Although the most outstanding case that has been published is the design of potent small molecule inhibitors of the HIV-1 aspartyl protease, SARS CoV cysteine protease has also been targeted recently. Dariusz Plewczynski and colleagues at the Interdisciplinary Centre for Mathematical and Computational Modelling at Warsaw University in Poland had identified inhibiting substrates of the SARS CoV protease that could be further investigated for treatment of the viral infection. The group utilized the eHiTS flexible docking algorithm to simulate binding of known substrates that have previously shown affinity for the protease by screening the protein data bank, then followed up with modification of the original substrates by using Tanimoto's coefficient and use the docking algorithm to score these modified substrates. All compounds were added to their mini library and their eHiTS scores were published as a result (16).
The goal of the research was to show that modification of these peptide substrates at the peptide bond could have similar affinity for the active site of the SARS CoV protease that could act as mimic so that it could potentially block the activity of the target protein. The significance is that the group identified a substrate analogous to the peptide substrate, identified as AG7, and modified a position of the peptide bond with a phosphorate ether that drastically alters the radius of interaction between the analog and the protein. Also, it was reported that score for the AG7 modified substrate had a better score of -5.940 than the original peptide substrate, which had a score of -5.615. As a result, the simulation suggests that the modified AG7 substrate has a stronger affinity for the SARS-CoV protease than the unmodified peptide substrate that has known affinity for the target (16). Significant results are displayed in Table 1.

Table 1 Peptide inhibitor eHiTS scores and their modified analogs of the SARS CoV cysteine protease (16).

HIV-1 protease inhibitors have also received a significant amount of attention due to its potential impact in prevention of a further epidemic of the virus. Vijay M. Khedkay and colleagues at the Department of Pharmaceutical Chemistry at the Bombay College of Pharmacy in India explored docking symmetric cylic urea inhibitors of HIV-1 proteases and creating analogs of such substrates with certain descriptors adjacent to the the carbonyl of the compound. The GOLD docking algorithm was utilized and the results were compared to the crystal structure of the HIV-1 protease and a cyclic urea analog DMP323 that were co-crystallized. The root mean squared deviation of the cyclic urea analog DMP323 calculated by docking compared to experimental results were found to be 0.49 Angstroms, which means the results can be reproduced reliably with other analogs created in silico (20).
The results of the molecular docking studies has shown a set of potential lead compounds for inhibition of the HIV-1 protease. DMP323 compound, which is a known inhibitor of the protease, has a lower score for inhibition of the protease compared to some of the lead compounds designed by the group. The average score for inhibition for DMP323 was 9.92 for a predicted pKi. The top compound simulated had a predicted pKi of 12.32, which is significantly greater than the DMP323 compound with known activity (20). It should be noted that GOLD and Lib Dock scores ligands differently. The higher, more positive score is preferred. Each docking algorithm differs with respect to this, whether it is a positive or negative value placed in front of the score calculated. The significance of the simulation study was that it provided insight towards the design of new symmetric cyclic urea analogs by using site modification that could be utilized for more potent small molecule inhibitors as well as for HIV mutants that have developed resistance to available cyclic urea analogs for the treatment of the viral infection. Both studies were able to stay within certain tolerance limits of proper docking poses of the substrates identified by using co-crystallized ligand-protein complexes (16, 20). Significant results from this research are displayed in Table 2.

Table 2 Symmetric cyclic urea inhibitors and modified analogs of the HIV-1 aspartyl protease. GOLD Dock algorithm was used to dock small molecules into active site and theoretical pKi's were calculated (20).

Future Interest in Prostate-Specific Antigen
Rational drug design and taking a structure-based approach is fundamental in our research goals. The target protein currently being studied is prostate-specific antigen, or PSA, which is part of the kallikrein family of serine proteases regulated by the androgen receptor. Over expression of the PSA protein is usually a biomarker for prostate cancer, since it is responsible for cleavage of certain growth factors that influence cell proliferation, but other processes are not well-defined (9). Although not many substrates for PSA have been identified, it has been noticed that prostate-specific antigen has similar activity to other serine proteases, specifically chymotrypsin, that it shares an identical catalytic triad of Histidine 57, Aspartic Acid 102, and Serine 195 (10). Table 3 displays the homology between kallikrein-related proteases described by Menez, Renee et al.

Table 3 Peptide sequence homology between prostate-specific antigen with other kallikrein family of proteases. HIS57, ASP102, and SER195 are conserved for all sequences (10).

The assumption being made is that since there is evidence of homology between proteases of the kallikrein family and the catalytic triad is conserved with respect to biological mechanism, that is, they are all serine proteases, then small molecules that are active for one specific type of serine protease should also have some affinity for another type of related protein such as prostate-specific antigen. It may seem that the particular statement made is far too simplistic, however, it is a good starting point since one drug-protein interaction should share some binding characteristics of another protein with similar homology (11, 12). One class of substrates that are being studied for inhibition of serine proteases are peptidyl (alpha-amino alkyl) phosphonate esters, which are irreversible inhibitors of the target proteins. Previous research has demonstrated that derivatives of this inhibitor, by varying the peptidyl chain of the 4-amidinophenylglycine phosphonate diphenyl ester, has shown significant inhibition of trypsin as well as granzyme A and K (13). The anchoring fragment of the inhibitor is drawn in Figure 4.

Figure 4 Suggested mechanism of inhibition for 4-amidinophenylglycine phosphonate diphenyl ester derivatives with trypsin-like serine proteases. The extended binding substrate represents the peptidyl fragment of the inhibitor (13).

The overall goal is to design a novel compound using the 4-amidinophenylglycine phosphonate diphenyl ester derivative as a starting point utilizing docking simulations available in Accelrys Discovery Studio 2.1, and soon version 3.0 to accomplish this task. Currently, the Lib Dock docking algorithm is being used to dock small molecules into the target protein. Lib Dock scores each individual pose of the ligand and displays the results in a viewer as well as a table. The score is calculated by using a hydrogen-bonding potential or steric potential. The mathematical form is shown in Figure 5, which was described by D.J. Diller (2).

Figure 5 The first represents the scoring function for conformations of docked ligands. The second function is the rigid body transformation where R is a 3x3 rotation matrix and T is translation. The third function is used to score two potentials between two atoms for the final optimization step (2).

The score takes into account hydrogens between the ligand and the binding site as apolar, hydrogen bond donating, hydrogen bond accepting, or donor/acceptor. Before this, certain thresholds can be established in terms if discarding poses that have a certain percentage of steric clashes between ligand atoms. Matches are optimized afterwards using a BFGS optimization algorithm (2). Other parameters can be set such as quality of docking by, for example, ranging match tolerance from 0.25 Angstroms to 0.5 Angstroms, as well as adjusting or limiting the number of hotspots on the binding site. The number of saved poses can also be modified. Currently, virtual library screening is something of interest in determining other possible substrates that could potentially inhibit prostate-specific antigen's activity and modify current target ligands to optimize binding affinity for PSA. Figure 6 displays a screenshot of Accelrys Discovery Studio 2.1 performing a small molecule docking simulation, Lib Dock.

Figure 6 Accelrys Discovery Studio 2.1 Lib Dock simulation of 4-amidinophenylglycine phosphonate diphenyl ester derivatives docking onto the binding site of PSA. Image taken by Arben Kojtari November 11, 2010.

The structure-based approach using molecular docking algorithms is a powerful simulation technique to screen a multitude of compounds within a chemical library by utilizing either a stochastic or combinatorial method. The ultimate goal is to facilitate the screening process and determining possible lead compounds that can be later synthesized and tested in vivo and so on. The limitation of these scoring algorithms is that the scoring functions can be limited in terms of number of compounds screened within a certain accuracy and processing speed. It is difficult to determine a proper pose of a ligand on a binding site of a target protein in a simulation without prior knowledge of how it would dock in a real world scenario. By this statement, docking algorithms are best used when co-crystallization X-ray crystallography data is available of a known compound and then comparing the docked pose in a simulation to experimental results. These simulations are still useful in modifying small molecules with known activities to create possible lead compounds that could potentially show a higher affinity for the target. It should also be mentioned that even after lead compounds are determined after a simulation, other factors must be considered such as toxicity, bioavailability, ease of synthesis (SYLVIA score), etc. As the understanding of biomolecules expands and computing power continues to grow, molecular docking will see in increased role in drug design.

1. Ajmani, S.; Jadhav, K.; Kulkarni, S.A. QSAR Comb. Sci. 2009, 1, 36-51. DOI: 10.1002/qsar.200810063

2. Diller, D.J.; Merz Jr., K.M. Proteins: Structure, Function, and Bioinformatics 2001, 43, 113-124. DOI: 10.1002/1097-0134(20010501)43:2<113::AID-PROT1023>3.0.CO;2-T

3. Shimada, J.; Ekins, S.; Elkin, C.; Shakhnovich, E.I., Wery, J. Targets 2002, 1, 196-205. DOI: 10.1016/S1477-3627(02)02274-2

4. Kaprlus, M.; McCammon, J.A. Nature Structural Biology 2002, 9, 646-652. DOI: 10.1038/nsb0902-646

5. Brooijams, N.; Kuntz, I.D. Annu. Rev. Biophys. Biomol. Struct. 2003, 32, 335-373. DOI: 10.1146/annurev.biophys.32.110601.142532

6. DesJarlais, R.L.; Sheridan, R.P.; Dixon, J.S.; Kuntz, I.D.; Venkataraghavan, R. J. Med. Chem. 1986, 29, 2149-2153. DOI: 10.1021/jm00161a004

7. Abagyan R.; Totrov, R. J. Mol. Biol. 1994, 235, 983-1002. DOI: 10.1006/jmbi.1994.1052

8. Rao, S.N.; Head, M.S.; Kulkarni, A.; LaLonde, J.M. J. Chem. Inf. Model. 2007, 47, 2159-2171. DOI: 10.1021/ci6004299

9. Balk, S.P.; Ko, Y.; Bubley, G.J. J. Clinical Oncology 2003, 21, 383-391. DOI: 10.1200/JCO.2003.02.083

10. Menez, R.; Michel, S.; Muller, B.H.; Bossus, M.; Ducancel, F.; Jolivet-Reynaud, C.; Stura, E.A. J. Mol. Biol. 2008, 376, 1021-1033. DOI: 10.1016/j.jmb.2007.11.052

11. Dean, P.M. Journal of Molecular Recognition 1988, 1, 381. DOI: 10.1002/jmr.300010308

12. Kuntz, I.D. Science 1992, 257, 1078-1082. DOI: 10.1126/science.257.5073.1078.

13. Jackson, D.S.; Fraser, S.A.; Ni, L.; Kam, C.; Winkler, U.; Johnson, D.A.; Froelich, C.J.; Hudig, D.; Powers, J.C. J. Med. Chem. 1998, 41, 2289-2301. DOI: 10.1021/jm970543s

14. Neuman, B.W.; Stein, D.A.; Kroeker, A.D.; Churchill, M.J.; Kim, A.M.; Kuhn, P.; Dawson, P.; Moulton, H.M.; Bestwick, R.K.; Iversen, P.L.; Buchmeier, M.J. J. Virol. 2005, 79, 9665-9676. DOI: 10.1128/JVI.79.15.9665-9676.2005

15. Coronavirus never before seen in humans is the cause of SARS, World Health Organization, <>. Accessed 30 November 2010.

16. Plewczynski, D.; Hoffman, M; von Grotthuss, M.; Knizewski, L.; Eitner, K.; Ginalski, K. J. Phys.: Condens. Matter 2007, 19. DOI: 10.1088/0953-8984/19/28/285207

17. Yang, H.; Yang, M.; Ding, Y.; Liu, Y.; Lou, Z.; Zhou, Z.; Sun, L.; Mo, L.; Ye, S.; Pang, H.; Gao, G.F.; Anand, K.; Bartlam, M.; Hildenfeld, R.; Rao, Z. PNAS 2003, 100, 13190-13195. DOI: 10.1073/pnas.1835675100

18. Whiting, A.K.; Peticolas, W.L. Biochemistry 1994, 33 (2), 552-561. DOI: 10.1021/bi00168a021

19. Akaho, E.; Morris, G.; Goodsell, D.; Wong, D.; Olson, A. J. Chem. Software 2001, 7, 103-114. DOI:

20. Khedkar, V.M.; Ambre, P.K.; Verma, J.; Shaikh, M.S.; Pissurlenkar, R.R.S.; Coutinho, E.C. J Mol. Model. 2010, 16, 1251-1268. DOI: 10.1007/s00894-009-0636-5

21. Tie, Y.; Kovalevsky, A.Y.; Boross, P.; Wang, Y.; Ghosh, A.K.; Tozser, J.; Harrison, R.W.; Weber, I.T. Proteins: Structure, Function, and Bioinformatics 2007, 67, 232-242. DOI: 10.1002/prot.21304