Is conformation a fundamental descriptor in QSAR? A case for halogenated anesthetics

An intriguing question in 3D-QSAR lies on which conformation(s) to use when generating molecular descriptors (MD) for correlation with bioactivity values. This is not a simple task because the bioactive conformation in molecule data sets is usually unknown and, therefore, optimized structures in a receptor-free environment are often used to generate the MD´s. In this case, a wrong conformational choice can cause misinterpretation of the QSAR model. The present computational work reports the conformational analysis of the volatile anesthetic isoflurane (2-chloro-2-(difluoromethoxy)-1,1,1-trifluoroethane) in the gas phase and also in polar and nonpolar implicit and explicit solvents to show that stable minima (ruled by intramolecular interactions) do not necessarily coincide with the bioconformation (ruled by enzyme induced fit). Consequently, a QSAR model based on two-dimensional chemical structures was built and exhibited satisfactory modeling/prediction capability and interpretability, then suggesting that these 2D MD´s can be advantageous over some three-dimensional descriptors.


Introduction
Quantitative structure-activity relationship (QSAR) studies try to find a correlation between chemical structures and the corresponding bioactivities by means of molecular descriptors (MD´s). In this way, molecular architecture and substitution patterns in a series of congeneric molecules are described by calculable or empirical data having some relationship with bio-logical activity, becoming the technique useful for understanding the action mechanism of related drugs, as well as to drive the synthesis of new drug like compounds.
Since the milestone work by Hansch and Fujita [1], a variety of MD´s have been developed to improve the correlation of chemi-cal structures with bioactivity, ranging from hydrophobicity to three-dimensional descriptors [2]. Indeed, the most popular QSAR methods are based on molecular descriptors generated from 3D molecular structures, such as the widely used techniques based on molecular field analysis [3,4]. The problem with most 3D-QSAR methods is that the bioactive conformation of the compounds in a data set is usually unknown and, therefore, geometry optimization is carried out in a receptorfree environment to generate the molecular structure and, subsequently, the 3D MD´s. While molecular conformation in vacuum is governed by intramolecular interactions, the bioconformation is ruled by enzyme induced-fit; consequently, optimized and bioactive geometries are probably different to each other and, to obtain insight on the action mechanism of a drug and substituent effects, MD´s should not be generated over geometries optimized in a receptor-free environment. Efforts have been made to attenuate the drawback of using a conformation that is possibly wrong, e.g., by using average conformations, ensemble and multidimensional methods [5][6][7][8], but the risk of chemical-biological misinterpretation remains. Receptor-dependent QSAR methods have also been developed [9], but these are mostly complementary and are aimed at corroborating and/or rationalizing the results provided by the regression models, since the docking methodology itself provides intermolecular energies and docking scores that correlate with bioactivity. On the other hand, despite not encoding conformational information, 2D-QSAR can incorporate other stereochemical properties and also account for group sizes, substituent type and 2D shape of molecules. These have shown to be sufficient parameters to obtain satisfactory correlation with bioactivity data and valuable understanding on structural requirements for drug development.
Multivariate Image Analysis applied to QSAR (MIA-QSAR) is a genuine 2D method, since descriptors are pixels of molecular projections drawn as black and white wireframes [10] and, more recently, as colored 2D-images (chemical structures) with spheres representing atoms with sizes proportional to the respective van der Waals radii [11]. The variance in the atom color and coordinate of pixels in the images (the structural variance) explains the changes in the bioactivities block. Thus, the MIA-QSAR method is an appropriate approach to probe the validity of 2D-QSAR methods when the molecules in a data set undergo rotational isomerization.
In order to test our hypothesis, the biological activities of a set of volatile halogenated anesthetics were modelled using the recent version of the MIA-QSAR method. In addition, the suitability of applying 3D information obtained from high level calculations (in a receptor-free environment) in QSAR modelling was evaluated using a comparative study of the optimized and bioactive conformations of the fluorinated anesthetic isoflurane, which binds to a 4-helix bundle protein (apoferritin) [12] and to the integrin LFA1 enzyme [13].

Results and Discussion
Conformational analysis 2-Chloro-2-(difluoromethoxy)-1,1,1-trifluoroethane (isoflurane) undergoes rotational isomerization around two dihedral angles (Cl-C-O-C and C-O-C-H) and, considering three limit orientations for each of them (gauche, gauche' and anti), nine conformations are possible for isoflurane. However, geometry optimization at the MP2/6-311++g(d,p) and ωB97X-D/6-311++g(d,p) (a DTF method which includes dispersion effects) levels converged to five energy minima for the gas phase, implicit solvents (cyclohexane, DMSO and water, using the polarizable continuum model), and using one explicit water molecule as solvent to mimic a physiological medium ( Figure 1).
The results for the gas phase are in agreement with previous calculations [14] and microwave experiments [15], where five conformations were found, but only three could be experimentally detected, due to their lower energies. These correspond to conformers 1, 2 and 3 of Table 1, the most stable ones in the gas phase and implicit solution. It is worth mentioning that, according to our calculations, the solvent has a little effect on the conformer populations, such as in enflurane [16], suggesting that intramolecular interactions govern the conformational equilibrium in a biological-free environment. Indeed, Lesarri et al. [15] pointed out that anomeric effects owing to donor-acceptor (LP → σ*) interactions are responsible for the conformational preference of isoflurane. Despite the contribution from specific hyperconjugative interactions representing the anomeric effects (mainly LP O → σ* CF , Table 2), our natural bond orbital (NBO) analysis indicates that the main conformers 1 and 2 are less favored by electronic delocalization than 3, 4 and 5. Since the overall energy of a system is a composition of repulsive Lewis (steric and electrostatic) and attractive non-Lewis-type (hyperconjugation) contributions, the main factor governing the conformational stabilization of isoflurane comes from the more classical steric and dipolar interactions, likewise sevoflurane reported earlier [17].
It has been proposed that intramolecular CH•••FC hydrogen bond drives the conformational preference of enflurane [18]. In order to check if such an interaction operates in isoflurane, QTAIM (Quantum Theory of Atoms in Molecules) calculations were performed for 1-5. The molecular graphs of Table 3 do not exhibit F•••H or Cl•••H bond paths (only weak nonbonding interactions are observed for the high energy conformers 4 and 5). According to Koch and Popelier [19], the following QTAIM     parameters, obtained by integration over the atomic basins of the hydrogen atoms participating in hydrogen bonds, should be observed to characterize hydrogen bonds along with a bond path: loss of atomic charge (q), decreased first dipole moment (M 1 ), decreased atomic volume (V) and increased atomic energy (E). Since these criteria do not vary significantly among the conformers (either when H approaches to Cl and F or not), we can confirm that hydrogen bonding is not a decisive interaction for the conformational equilibrium of isoflurane.
Using a solvation model with explicit water, where specific solute-solvent interactions take place, conformer 1 is again the most stable form, suggesting that implicit solvation describes satisfactorily the actual conformational isomerism. However, is there an environment capable of overcoming intramolecular interactions and then changing the conformational preference of isoflurane? While comparison of 1 with the bioconformation of isoflurane bound to the integrin LFA1 enzyme (PDB code: 3F78 [13]) reveals small differences in the Cl-C-O-C and C-O-C-H dihedral angles (±8° and 17°, respectively), the isoflurane structure bound to apoferritin (PDB code: 1XZ3 [9]) does not match any optimized conformer ( Figure 1). So, does it make sense to use enzyme-free optimized conformations to obtain biochemical insights from 3D-QSAR? The following MIA-QSAR modeling can help to elucidate this question and to support the use of 2D approaches with the aim for obtaining biochemical information.

MIA-QSAR of anesthetic haloethers
MIA-QSAR is a genuine 2D technique in the sense that it takes into account 2D projections of chemical structure images to generate MD's. Each compound of a congeneric series is aligned to one another by a congruent motif and, therefore, the variable substructure along the series explains the variance in the bioactivities block. The variance in the chemical structures is captured by the different coordinates of the pixels composing the images ( Figure 2 shows the superimposed images used in this work to illustrate the structural variance). Pixels also vary in color, depending on the atom to which they refer. Since pixel values are a summation of RGB (red-green-blue) components and each channel is numerically equivalent to 255, the whole spectrum of colors can vary from 0 (black) to 765 (white). The pixel value (atom color) can be managed according to atomic properties and, therefore, each different atom in the series of 25 haloethers of Table 4 [20,21] was colored according to the respective Pauling's electronegativity (ε) scale, since polar properties should help to modulate their mode of interaction with an enzyme (see Table 5 for correspondence of ε with pixel values and approximate colors). Another useful parameter ruling bioactivity is the steric effect, which was accounted for by representing atoms in the molecules as colored spheres with sizes proportional to the corresponding van der Waals radii. The molecules were constructed using the GaussView program [22] and each 2D projection (each image, molecule) with 342 × 300 pixels dimension was unfolded to a row vector. Combination of the 25 images yielded a data matrix, which was regressed against the bioactivities column vector (pMAC, the negative logarithm of the partial pressure capable of suppressing the movement in response to noxious stimuli in 50% of rats) using partial least squares (PLS) regression. The MIA-QSAR model was constructed over 19 training set compounds and externally validated using the remaining 6 compounds (7, 8, 9, 15, 17 and  19), which were chosen using Kennard-Stone sampling. A good correlation between MIA descriptors and the pMAC values for the series of anesthetic haloethers indicates that 2D chemical structures encode biological activities. A previous study using log P (the octanol/water partition coefficient) as descriptor showed a good correlation with pMAC [21], but the resulting model was not validated. Thus, the present study provided an internal (through leave-one-out cross-validation) and external validation to attest the reliability of the MIA-QSAR model. This was checked through the respective root mean square errors (RMSE) and the determination coefficients  for the plot of actual vs predicted pMAC (q 2 and r 2 test > 0.5 are considered acceptable). Also, since the MIA-QSAR model was obtained from PLS regression, a robustness test (y-randomization test) was performed to guarantee that calibration was not overfitted nor obtained by chance correlation; the y block was randomized and subsequently regressed against the intact matrix (ten times). Reliable models are achieved when r 2 y-rand << r 2 , which is evaluated by c r 2 p , defined as c r 2 p = r × (r 2 − r 2 y-rand ) 1/2 [23]. Values above 0.5 for c r 2 p are considered acceptable. The statistical results for 9 latent variables (PLS components) illustrated in Figure 3 attest the predictability and reliability of the MIA-QSAR, thus suggesting that the 2D chemical structure indeed encodes biochemical properties and that MIA descriptors can be useful to anticipating pMAC of prospective congeneric drug-like candidates.
MIA-QSAR and 3D-QSAR have already been compared to each other in a variety of studies and the prediction results are similar [10,[24][25][26], but the confidence in the interpretation provided by conformation-dependent methods is questionable as the minimum conformation may not necessarily represent the bioactive conformation. Both MIA-QSAR and 3D-QSAR methods are based on alignment rules and, therefore, they should be applied to congeneric series. Non-congeneric data sets are not within the applicability domain of MIA-QSAR and alignment-based methods and, since the alignment does not apply for these types of sets, the conformational landscape would affect only some specific descriptors used in classical QSAR, e.g., the 3D matrix-based descriptors available in the Dragon software.
Thus, a major goal in QSAR is to determine and interpret the chemical motifs/properties responsible for the observed biological effects. We have checked that even the simple log P model yields good prediction ability upon selection of test set compounds using the Kennard-Stone sampling (r 2 test = 0.80); however, the main drawback of this analysis lies on the vague notion of the group types and/or molecular positions that most affect the pMAC values. Thus, the MIA descriptors were searched as source of chemical interpretation for the QSAR model. Because of the numerous MIA descriptors generated to build the model, a straightforward analysis would not be an easy task using the raw data. Thus, pre-filtration procedures were performed in order to reduce the number of variables. In this sense, the first approach was a measure of Shannon's entropy (SE), corresponding to an unsupervised classification variables filtering [27,28] applied to a 25 discrete intervals scheme. Variables with less than 10% of the maximum SE (SE MAX = log 25) were discarded. The variables were filtered again through the correlation coefficient (x/x), with variables removed for each set of variables with a x/x = 0.98 [29]. As the data set under study is highly correlated, only two variables (X1876 and X5979) were identified containing further information regarding the anesthetic activity (pMAC). A careful analysis of the reduced matrix (Table 6) reveals the approximate positions of pixels X1876 and X5979 in the structural scaffold. Therefore, principal component analysis (PCA) was applied to obtain information on how these coordinates affect the bioactivity in terms of the two principal components PC1 (56.59%) and PC2 (43.41%) (Figure 4). Since PCA is a pattern recognition tool, the set of compounds were classified in three levels of anethestic intensity: low (pMAC ≤ 1.00 ), medium (1.00 < pMAC < 2.00), and high (pMAC ≥ 2.00) levels. From the PCA scores plot shown in Figure 4, three clusters are observed: one at positive scores in PC1 (compounds with moderate-to-high activities), another with negative scores in PC1 and around null scores in PC2 (compounds with moderate-tolow activities), and the third one at negative scores in PC1 and very positive scores in PC2 (compounds with low activities). Variables X1876 and X5979 in the loadings plot are responsible for clustering these compounds according to the scores plot.
X5979 clearly separates compounds 22 (moderate activity) and 24 (high activity) from the remaining ones in PC2, due to their high pixel value at this position (765, white -blank space in the image). Bulk and hydrophobic substituents (R) at position R 4 (surrounding X5979, see substituent numbering in Table 4) have long C-R bonds and, therefore, the pixel variable is located where small substituents (such as hydrogen, fluorine and even chlorine) appear, while the bromine atom does not occupy this coordinate because of its long bond distance with carbon. From this, since compounds 22 and 24 contain a bromine substituent at position R 4 and pertain to the moderate-to-high activity class, such an halogen tends to favor an improved anesthetic activity of congeneric haloethers in this position.
PC1 explains most of the data variance and, therefore, variable X1876 should play a significant role for the bioactivity pattern of the data set compounds, because of its high loading in PC1.
In addition to 22 and 24, compounds 14, 21 and 25 (also pertaining to the moderate-to-high activity class) lie in the region with positive scores in PC1, while, e.g., compounds 7 and 18 (low activity compounds) have very negative scores in PC1. In this case, the pixel value 765 at position X1876 indicates the absence of a substituent (hydrogen at R 1 ) and, consequently, atoms encoded by high pixel values (e.g., 400 for the small and electronegative fluorine atom, but mainly for the small hydrogen atom) at R 1 tend to strongly favor the increase in pMAC of anesthetic derivatives.
Because the absence of intermolecular hydrogen bonds between enflurane (a similar anesthetic haloether) and the integrin LFA1 enzyme [16], the structural requirements mentioned above should be dictated by hydrophobic interactions of R 4 with amino acid residues and by low steric effects surrounding R 1 . Thus, 2D structural information provided by MIA descriptors (particularly related to the connectivity of different atoms in the present case) was capable of modeling and interpreting biochemical information of a series of anesthetic haloethers, without considering the conformation.

Conclusion
It has been shown that a predictive and biochemically interpretable QSAR model can be obtained through bidimensional descriptors. Conformational insights could refine the analysis, but the risk of using wrong conformations could cause misinterpretation of the results. This eminent risk was assessed by investigating the conformational isomerism of isoflurane, whose bioconformation in at least one biological target is significantly different from the geometries optimized in a biological-free environment. A more secure option would be to obtain the ligand geometries inside an enzyme active site. Since conforma- tional search inside a receptor normally gives the mode of interaction between substrate and enzyme, as well as the intermolecular interaction energy (related to the ligand-receptor affinity and, consequently, to the bioactivity), further QSAR analysis would be of limited utility.