X-ESS : An Automated Pipeline for Computational Drug Design

The traditional procedure of discovering new drugs is challenging and time-consuming, often having very low success rates. To date, scientists have to go through a very tedious process of research, selection and synthesis of drug candidates before they can be tested in multiple levels of pre-clinical and clinical trials. However, with the rising wave of new technology, in silico (computational) modeling is starting to take center stage in drug design and discovery.. Prescience Insilico(PRinS3) is a forerunner in utilizing the power of computation (on premise HPC and Public Cloud) and artificial intelligence to turn the novel drug screening pipeline into an easy, efficient, economical and rapid ordeal.

Prescience Insilico has brought forward a modern integrative computational platform PRinS3 which comprises five separate Applications (App), namely BioIn, ChemIn, X-ESS, AI/SyMoG and X-HTVS. These APPs provide the core components required for smooth and systematic virtual drug screening. X-ESS comprises modules like molecular docking, molecular dynamics simulations (MD), MM-PBSA and Metadynamics for estimating the free energy barriers of binding ligands to targets in solvents like water. With a constant network of effort, the company has been working on a series of other Apps as well to provide the researchers with an all-inclusive experience of exploration. At the moment, X-ESS can manage large datasets (of around 500 combinations; i.e. multi target and multi ligand) for screening at a time and operates various stages of the process by connecting the software to the best hardware resources available. However, all of these require a considerable amount of processing power, in some cases a local workstation may fall short. Thus, other than the local machine, the software also has the Data-connector helpful in connecting to any high performance computing (HPC) environment (e.g. National Super Computing Mission - NSM at Indian Institute of Technology Kanpur) or public cloud platforms (e.g. Google Cloud), based on the user’s requirements and choice. In this study, a system made up of protein-ligand complexes was screened on different hardware platforms i.e., workstation, HPC at NSM - IITK, Google Cloud Platform) in order to evaluate the performance and efficiency of the X-ESS across variable computing hardware. One could run X-ESS and X-HTVS apps on a hybrid platform as well.

Choice of Dataset

PAK1 or p21-activated kinase 1 is a pharmacologically important protein because of its intense involvement in a multitude of signaling pathways responsible for various cellular processes. Mutations/overexpression of this gene can lead to diseases like IDDMSSD (Intellectual developmental disorder with macrocephaly, seizures, and speech delay) and also a variety of cancers. For this study, along with the wild type protein, four mutant variations were also used which have mutations at 299, 389, 393 and 423th residues.

Significance of the selected mutations are outlined in the following table.

Mutated Residue Number Significance of the Mutation
299 (Lys->Arg) Decrease and abolition of kinase function
389 (Asp->Asn) Can cause abolition of kinase activity
393 (Asp->Ala) Dissolves the ability to auto-phosphorylate at Thr-423
423 (Thr->Glu) Constitutive kinase activity and decrease of CDC42-stimulated activity
After selection of the protein, 51 FDA approved drugs were also chosen as ligands for mass screening, which are reported to have inhibitory activities against various kinases. As there is still a lack of research on PAK1 as a potential target, only Fostamatinib among the chosen drugs/ ligands was found to have been tested and approved by FDA previously as an inhibitor drug to our protein of interest.

Computational Resources
GROMACS 2018.3 was built in single precision with GCC 8.3.0, FFTW 3.3.8 (single precision), OpenMP multithreading and GPU was supported with CUDA 10.2. Plumed 2.5.4 was patched with gromacs. Autodock 4.2.6 with MGLTools 1.5.7 was used for Docking studies. We used docker containers to deploy these packages on the host machines (GC and Workstations). The docker containers had everything the PRinS3 applications needed to run including libraries, system tools, code, and runtime. The advantage of using Docker is that we can quickly deploy and scale applications into any environment and gives us the confidence that our code will run. This creates a highly reliable, low-cost way to build, ship, and run distributed applications at any scale. We have built our docker container on Debian OS harnessing the GPU power. It can be deployed on any Linux, Windows, or macOS computer.

Technical Specifications of Machine Used in this Study
Machine Processor Sockets X cores Threads/cores Clock (GHz) GPUsnVidia
NSM Intel Xeon 2 x 20 1 2.5 V100
Workstation i9-9820X 1 x 10 2 3.30 GeForce RTX 2080 Ti
GC Intel Xeon 1 x 4 2 2.0 Tesla T4
We install the GUI desktop application PRinS3 on one of the host machines and from there it can use any machine in the network or in the cloud (e.g. Google Cloud). Figure 1 shows the PRinS3 setup and its communication with the servers in the network. The access to the cloud instances (compute nodes in the cloud) is seamlessly done. We use edge-server computing technology to offload expensive computation to the servers in the network and/or in the cloud. The edge side machine hosts PRinS3 docker containers and the server side can deploy containers or has non-docker customized applications installed. In the case of customized servers, the dependency softwares and applications needs to be installed. The location of these dependencies is provided in the GUI desktop applications by the user.

Figure 1. PRinS3 setup and its communication with the servers in the network. The UI is installed on the host’s desktop (or laptop). The UI bundles the applications and data connector (mechanism to connect to the servers). Along with the UI, a docker image is deployed as a container in the host desktop (or laptop). This host machine can also be used for performing the computations.

Testing the power of computational resources

The Benchmark dataset contains 255 protein-ligand combinations made up of high resolution crystal structure of refined and processed PAK1 kinase protein (PDB ID: 3Q52), its four mutants, along with 51 kinase approved drugs for repurposing. The refinement of the target proteins and retrieval of the data for FDA approved drugs were done using the PRinS3 software itself.

Each step of the screening is primarily divided into three sections, i.e., Preprocessing and upload of data, Job run and Downloading the results. At first, the docking of each ligand into the corresponding binding pockets of the targets was ensured by providing the binding residue numbers ; i.e., 299 and 389. The number of Genetic Algorithm (GA) runs was set to 100; default for our system. After the completion of preprocessing, the files were uploaded to the selected server by data connector (proprietary tool developed by Prescience Insilico) and docking jobs were run. The calculations of molecular docking in X-ESS are done using Autodock 4 and the docking is carried out based on Lamarckian genetic algorithm. Upon download and analysis of the data derived from molecular docking, 25 protein-ligand combinations were funneled out to take forward for MD. Advanced parameters were kept at their defaults for the operations and the NPT/NVT simulation time was set at 5ns. To simulate MD, water molecules were added to the system in the form of a water model. The estimation of size of the whole system in terms of number of atoms is provided below.
System Details
Component No of Atoms
Protein -4640+/-3
Ligand -70+/-4
Water -91005+/-10
After MD is done, a few out of the 25 combinations were found to be better suitable for further evaluation. Based on the simulation scores, metadymanics based Free Energy Sampling (FES) was performed with only 8 protein-ligand complexes. Each individual combination was scheduled for 5 independent runs as per default parameters in XESS. In MD and FES both, the same computational resources were used, as was done in case of docking.
The time taken for preprocessing, simulation and data download of each step was noted and used as a gauge of performance of the PRinS3 software. The procedure was repeated in the exact same manner across three different resources; i.e., NSM HPC, cloud and local workstation. The data derived from each case were analyzed and compared with each other to reflect on the contribution of computational resources in acceleration of the processes.
Result and Discussion
In order to provide a simplistic outlook on the efficiency of PRinS3, we looked into and recorded the duration of preprocess/upload, runtime and data download for Molecular docking, MD and FES each; across each of the three different servers.
When working on a local workstation, the preprocessing and upload of our system files to the server in case of docking, MD and FES took nearly 33+/-1 seconds, 22+/-1 seconds and 23+/-0.5 seconds respectively. When using a NSM server, the docking, MD and FES took around 36+/-2, 26+/-5 and 25.5+/-0.5 seconds respectively. In case of a cloud, the upload time for docking, MD and FES were found to be 41+/- 0.5, 37+/-1 and 37+/-4 seconds respectively

Figure 2

In a similar fashion, the duration of runtime of jobs and download of files to the local machine were also recorded and plotted accordingly (Figure 2). The data derived has been represented in a tabular form below.

Computational resources Stage of screening Preprocessing/Upload (s) Runtime (s) Download (s)
Docking33+/-1 3000+/-500 57+/-10
Workstation MD 22+/-1 5000+/-50 36+/-16
FES 23+/-0.5 45000+/-13000 26+/-6
Docking 36+/-2 2000+/-500 70+/-5
NSM Server MD 26+/-5 7000+/-50 47+/-10
FES 25.5+/-0.5 12000+/-2000 37+/-11
Docking 41+/- 0.5 4000+/-500 71+/-1
Cloud MD 37+/-1 8500+/-100 59+/-12
FES 37+/-4 6000+/-1000 42+/-5

Docking with the same dataset, the time taken for preprocessing and data download took increasingly more time for workstation, NSM server and google cloud respectively. However, for running the docking jobs itself, the NSM server took the least amount of time, the cloud taking the most. In the case of MD however, the workstation performed all of the preprocessing, run and download in the least amount of time. The NSM server being the next, and the cloud following soon after. Interestingly, at the time of FES runs, the workstation consumed considerably more time than both NSM server and the cloud server, making NSM the most efficient for the job at hand. Although for preprocessing and downloading of data from FES, the time increased from workstation to NSM server and then to cloud.

Lobelia Ghosh