Synthesis and in silico screening of a library of β-carboline-containing compounds

The synthesis of a library of tetrahydro-β-carboline-containing compounds in milligram quantities is described. Among the unique heterocyclic frameworks are twelve tetrahydroindolizinoindoles, six tetrahydrocyclobutanindoloquinolizinones and three tetrahydrocyclopentenoneindolizinoindolones. These compounds were selected from a virtual combinatorial library of 11,478 compounds. Physical chemical properties were calculated and most of them are in accordance with Lipinski’s rules. Virtual docking and ligand-based target evaluations were performed for the β-carboline library compounds and selected synthetic intermediates to assess the therapeutic potential of these small organic molecules. These compounds have been deposited into the NIH Molecular Repository (MLSMR) and may target proteins such as histone deacetylase 4, endothelial nitric oxide synthase, 5-hydroxytryptamine receptor 6 and mitogen-activated protein kinase 1. These in silico screening results aim to add value to the β-carboline library of compounds for those interested in probes of these targets.


Introduction
Identification of a comprehensive set of small organic molecules capable of selectively modifying the function of biological targets tremendously impacts modern medical research and drug discovery efforts [1].Currently, this set of small mole-cules is largely occupied by in-house libraries and commercially available compounds.The NIH Roadmap initiative was established to address a recognized limitation of current compound diversity resulting in the Molecular Libraries Probe Centers Network (MLPCN) which has, since its inception, garnered a library of over 370,000 chemically diverse small molecules in a central molecule repository [2].This supply of compounds has been made possible by researchers across the disciplines, but largely by synthetic chemists who are preparing compounds with an eye towards biologically relevant targets.Another goal of the NIH Roadmap is the development of enabling methods for the synthesis of these structurally diverse compound libraries; amongst these methods, skeletal diversification strategies have emerged as particularly efficient for maximizing structural diversity [3].
Previous work in the Brummond laboratory has demonstrated that an allene-containing β-carboline provided a good starting point for synthesizing six novel types of hetero-frameworks, all skeletally unique [4].Moreover, scope and limitation studies contributed to an understanding of chemistries that would possess the robustness necessary for library preparation.Information gained from these experiments was then utilized in the construction of a virtual library of 11,748 compounds.A diversity analysis was performed using B (Burden) C (CAS) UT (Pearlman at the University of Texas) metrics and Tanimoto coefficients (Tc) and this virtual compound library was mapped onto the existing chemical space of the NIH Molecular Libraries Small Molecule Repository (MLSMR) [4].When considering the physical properties most important to bimolecular interactions, atomic Gasteiger-Hückel charges, polarizabilities, and hydrogen-bond acceptors, these virtual compounds were found to occupy new chemical space when compared to the 327,000 compounds in the MLSMR.A small subset of these compounds was subsequently identified as ones representing a maximally diverse chemical space.The synthesis of a modified subset of this virtual compound library is described within, where modifications were mainly driven by studies of compound stability.Furthermore, a high throughput, in silico screening analysis of this library identified a number of potential biological targets for the compounds.

Results and Discussion
Scaffolds 1, 2 and 3 (Figure 1) were chosen for library preparation based upon favorable Tanimoto coefficient (Tc) scores when compared to the MLSMR, conformational constraints imposed by the β-carboline moiety, and the number of building blocks available for the diversifying elements R 1 and R 2 .
The syntheses of tetrahydro-β-carbolines 6{1-16} were accomplished in a manner entirely analogous to that reported previously (Table 1, entries 1-16) [4].For example, the allenic methyl ester of tryptophan 4 was reacted with a number of aldehydes 5{1-15} under acidic conditions to produce the corresponding products in yields ranging from 54-89%.A range of aldehydes were accommodated in the Pictet-Spengler reaction, including formaldehyde (Table 1, entry 1), alkyl aldehydes (Table 1, entries 2 and 3), aryl aldehydes with electron-withdrawing and electron-donating groups (Table 1, entries 4-7, 14 and 15), heteroaromatic aldehydes (Table 1, entries 8-13) and glyoxalates (Table 1, entry 16).Moreover, useful quantities of β-carboline-containing products were obtained (43-100 mg).For entries 2-15, mixtures of two diastereomers were obtained.Since the mixtures could not be readily separated by column chromatography, diastereomeric ratios were determined by 1 H NMR and were advanced without further purification.Reaction of allene 6 under the silver-nitrate-mediated cyclization conditions afforded the desired fused pyrrolines 1.However, in the initial phases of this cyclization process, a color change was noted during the purification process.Indeed, when NMR stability studies were performed on the syn-and anti-pyrrolines 1{5}, decomposition of both diastereomers was evident.Although it was generally difficult to isolate the individual diastereomers, they could be separated by column chromatography.It was found that anti-1{5} decomposed more rapidly than syn-1{5} during the 1 H NMR stability studies, when compared to an internal standard.These results combined with previously reported skeletal reorganization processes of functionalized β-carbolines, led to concerns about the long-term storage of these compounds and their inclusion in the MLSMR [5].
To increase the stability of this class of compounds, a toluenesulfonyl group was added to the indole nitrogen of 1{1-7} to give N-tosyl-tetrahydro-β-carbolinepyrroline derivatives 7{1-7}.These tosylated derivatives exhibited improved stability as evidenced by 1 H NMR (Table 1, entries 1-7, and Supporting Information File 1, S76-S81).Incorporation of the tosyl group  also eased the chromatographic separation of the syn-and antiisomers for entries 2-7, thus compounds 7{2-7} were obtained as single diastereomers.Low to moderate yields for this twostep reaction sequence were attributed to a problematic tosylation due to the sterically hindered nature of the indole nitrogen atom.Moreover, unforeseen limitations were encountered for the heteroaromatic and naphthyl-containing β-carboline intermediates (Table 1, entries 8-15).While in some cases the intermediate pyrrolines 1 were observed (Table 1, entries 8, 10, 14, 15), the corresponding tosylated products were not obtained.
The heteroaromatic examples (Table 1, entries 9, 11-13), did not undergo cyclization upon treatment with silver nitrate.For these cases, it was assumed that competing coordination of the heteroatom to the silver ion was an issue; however, attempts were not made to alter the reaction conditions for these substrates.Furthermore, conversion of the naphthalenecontaining analogues 1{14} and 1{15} to their corresponding tosylates was not successful.
The majority of these β-carboline-containing products exhibit acceptable calculated physical-chemical properties in accordance with Lipinski's rule of five (Figure 2) [9,10].These favorable properties and structural novelty make these valuable candidates for deposition in the MLSCN for biological activity evaluation.
Diversity-oriented synthesis (DOS) has been employed to generate thousands of the organic compounds that have been deposited in the NIH molecular repository for medicinal chemistry research.Deciphering the therapeutic potential of this many compounds is a continuing challenge.By combining chemogenomics databases, such as Protein Data Base (PDB) and ChEMBL, it is possible to map new compounds into existing chemical space and to predict protein targets for new compounds, for which there are two complementary strategies that can be implemented.One is a structure-based docking  strategy, in which a query compound is fit into a series of protein binding pockets to identify favorable compound-protein interactions.A second approach is a ligand-based strategy in which the structural similarities between a query compound and a collection of bioactive compounds are identified.In the present study, both of these strategies were used to predict potential targets of the newly synthesized library of β-carbolinecontaining compounds.

High-throughput docking studies for proteintarget prediction of newly synthesized compounds
Molecular docking studies were performed with the 34 newly synthesized compounds, represented by scaffolds 1, 2, 3, and allenyl precursors 4, 6{1-16}, 9{1-4} to identify potential protein targets [11].Protein structures were downloaded from the PDB [12] and the analysis was limited to a selection of the 607 proteins defined as "druggable" targets, in order to reduce computational time [13].(The complete listing of these proteins and their PDB IDs are provided in Supporting Information File 2).The Surflex-dock module of the Sybyl software was employed for protein preparation and docking of the β-carboline library [14,15].Water molecules and ligands were removed from the protein structures and the active site of each protein was defined by the corresponding residues around the cocrystal- lized ligands.In-house algorithms were used to evaluate liganddocking efficiency, and docking scores were used to assess and rank the protein targets.
A portion of the protein-scoring matrix is illustrated in Figure 3. Several interesting results emerged from this in silico analysis: (1) Twenty of the new compounds have docking scores greater

Ligand-based strategy for target prediction
Ligand-based target prediction algorithms have been developed based upon an established medicinal chemistry principle that structurally similar compounds, with comparable physical properties, should convey related biological properties [16,17].In this study, structural similarities were calculated between the compounds of the β-carboline library and the bioactive compounds in the well-annotated database, ChEMBL version 13, the largest publicly available compound-target database, containing 1,143,682 distinct compounds, 8,845 targets and 6,933,068 bioactivity entries from 44,682 publications and PubChem bioassays [18,19].The Openbabel FP2 fingerprint was used as a descriptor to assess similarities between molecules [20].Tanimoto coefficients were calculated between the compounds of the β-carboline library and the ChEMBL database, and only β-carboline compounds with a Tc greater than 0.60 were considered for bioactivity analysis.A lower Tc threshold was used to identify a larger number of bioactivity targets.Table 4 lists the most promising bioactive targets for the newly synthesized β-carbolines together with the structurally similar lead compounds in ChEMBL along with their reported potency and literature citations.Several interesting results emerge from the comparison study performed, including a number of targets that the compounds should be screened against, such as C-C chemokine receptor type 3, gamma- aminobutyric acid receptor subunit gamma-2, breast adenocarcinoma cells, 5-hydroxytryptamine receptor 6, angiotensinconverting enzyme, and DNA polymerase iota.Moreover, nine of the twelve compounds are represented by allene precursors, ones that were not originally considered in the diversity analysis.

Conclusion
A library of 34 β-carboline-containing compounds was synthesized utilizing a skeletal diversification strategy.Highthroughput docking and ligand-based protocols were implemented to predict potential biological targets of the newly synthesized β-carbolines.The docking approach uses a structure-based technology to predict preferred interactions between compounds and protein targets, whereas the ligand-based method uses ligand similarity coefficients to identify potential biological targets.The complementary nature of these two protocols is evidenced by the fact that there was no overlap in the predicted biological targets.Furthermore, the in silico screening of these compounds is intended to add value to the library, by directing them to appropriate biological assays.Such strategies can also be used to explore the mechanisms of a biologically active compound in bioassays whose molecular target is as of yet unidentified.

Figure 3 :
Figure 3: Results of high-throughput docking analysis.Top: A docking-score matrix arranged by compound IDs and PDB IDs; bottom: Structures of known ligands HR2 and TGF and the newly synthesized compounds 2{1,5} and 2{1,7}.Docking scores larger than 7.0 are red colored and can be mapped to K d values less than 100 nM.The corresponding protein names of PDB IDs and the full docking-score matrix are listed in Supporting Information File 1.
a Isolated yields; b purity determined by LCMS/ELS.

Table 4 :
Potential targets of β-carbolines based upon bioactivity data in ChEMBL.