Discovery of unguisin J, a new cyclic peptide from Aspergillus heteromorphus CBS 117.55, and phylogeny-based bioinformatic analysis of UngA NRPS domains

Several under-explored Aspergillus sp. produce intriguing heptapeptides containing a γ-aminobutyric acid (GABA) residue with as yet unknown biological functions. In this study, a new GABA-containing heptapeptide – unguisin J (1) – along with known unguisin B (2) were isolated from a solid culture of Aspergillus heteromorphus CBS 117.55. The structure of compound 1 was elucidated by extensive 1D and 2D NMR spectroscopic analysis including HSQC, HMBC, COSY, and 2D NOESY as well as HRESIMS. The stereochemistry of 1 and 2 was determined by Marfey’s method. A biosynthetic gene cluster (BGC) encoding unguisins B and J was compared to characterized BGCs in other Aspergillus sp. Since the unguisin family of heptapetides incorporate different amino acid residues at different positions of the peptide, the A and C domains of the UngA NRPS were analyzed in an attempt to understand the lack of substrate specificity observed.

NRPS enzymes are large multifunctional enzymes that often synthesize very important bioactive molecules [11,12].These enzymes consist of several catalytic domains organized into modules.Typically, a module possesses an adenylation (A) domain for selecting and activating amino-or keto acids, a thiolation (T) domain for shuttling intermediates between catalytic domains, and a condensation (C) domain that catalyzes amide or ester bond formation.Additional common domains include epimerization (E) domains for converting naturally occurring ʟ-amino acids to ᴅ-amino acids, methyltransferase (MT) domains that typically methylate specific N atoms, and terminal condensation (C T ) domains which cyclize the growing peptide chain and facilitate release from the NRPS.Of the fungal NRPS studied to date, many appear to have some tolerance for the range of amino acids incorporated by the A domains and the C domain has been highlighted as a gatekeeper [13].
Here, we describe the isolation of unguisin B, and a new congener named unguisin J, from Aspergillus heteromorphus CBS 117.55.We also perform bioinformatic analysis of the A and C domains of the UngA NRPS enzymes involved in their biosynthesis to try and rationalize the relaxed substrate specificity observed in this family of heptapeptides.

Results and Discussion
The cultivation of A. heteromorphus CBS 117.55 on rice solid medium yielded an organic-soluble extract, which was subjected to fractionation using preparative HPLC-PDA-ELSD and purification by semipreparative HPLC-PDA; this led to the isolation of a new cyclic peptide 1, along with unguisin B (2, Figure 2).The structure of the new compound 1 was elucidated by 1D and 2D NMR and HRESIMS/MS.Unguisin B was identified by the 1 H and 13   3).The other six amino acid residues were assigned based on 2D NMR spectra ( 1 H-1 H COSY, HSQC and HMBC) as Ala (2 equiv), Phe (1 equiv), Leu (1 equiv), Val (1 equiv), and γ-aminobutyric acid (GABA) (1 equiv).In addition to the COSY and HMBC correlations, the NOESY experiment showed important interactions between the NH signals corroborating with the peptide sequence defined to be Ala-1, Val-2, Leu-3, Phe-4, Ala-5, Trp-6, and GABA-7 (Figure 3).
Analysis of the NMR data of 1 allowed identifying characteristic 1 H and  3), together with key NOESY interactions between the NH signals at δ H 7.98↔8.44↔8.14.
Compound 1 was named as unguisin J.A second peptide was isolated from the same culture of A. heteromorphus CBS 117.55.Compound 2 was obtained as an amorphous white powder, +37 (c 0.1, EtOH) [lit +40 (c 1.0, EtOH)] [5]; for the 1 H and 13 C NMR spectroscopic data, see Table S2 in Supporting Information File 1.By comparison with literature data this compound was identified as unguisin B (2) [1,5], further corroborating the identification of the new unguisin J (1).
To the best of our knowledge these are the first metabolites reported from A. heteromorphus CBS 117.55.
The co-isolation of unguisins B and J indicates that module 4 of the NRPS is able to accept two different amino substrates and so may possess subtle differences to UngA from A. violaceofuscus CBS 115571 which has relaxed substrate specificity in module 3. We performed genome mining of the publicly available A. heteromorphus CBS 117.55 (accession number MSFL00000000.1)[15] using fungiSMASH and identified a four gene BGC encoding a seven module NRPS, an alanineracemase, a hydrolase, and a transporter.We named this BGC ung'' to distinguish it from the ung BGC present in A. violaceofuscus and the ung' BGC in A. campestris IBT 28561 which encodes unguisins H and I [5].Clinker analysis with the ung BGCs from A. violaceofuscus CBS 115571 and A. campestris IBT 28561 indicated a high level of homology (Figure 4).The biosynthesis of unguisins B and J therefore is proposed to arise from this single BGC, similar to the biosynthesis Scheme 1: Proposed biosynthesis of unguisins B and J in A. heteromorphus CBS 117.55.

of unguisins A and B in A. violaceofuscus CBS 115571 (Scheme 1).
Within the unguisin family, there is variability in the amino acids incorporated at positions 2-6 (Figure 1), however, there are usually only one or two residue differences between molecules that are co-isolated from each source, e.g., A and B from A. violaceofuscus CBS 115571 [5]; A, B, and C from Emericella unguis [1]; A, E, F and G from Aspergillus candidus NF2412 [4]; H and I from A. campestris IBT 28561 [5]; and B and J from A. heteromorphus CBS 117.55 (Figure 1).This implies that only one or two modules per NRPS possesses a noticeable level of relaxed substrate specificity.To explore this observation, the A and C domains were identified in UngA, UngA' and UngA'' and phylogenetic analysis of the A and C domains was performed (Figure 5 and Figure 6).
The A domains do not clade according to substrate specificityinstead they clade according to which module they were extracted from.The A domains from modules 2, 3, and 4, which have relaxed substrate specificity, do appear to have evolved differently than A domains from modules 1, 5, 6, and 7 (Figure 5).Perhaps unsurprisingly the domains from UngA and UngA'' which both synthesize unguisin B, were more closely related than those from UngA' despite differences in substrate specificity in modules 3 and 4. Previously Matsuda et al. had compared the putative non-ribosomal codes for the UngA and UngA' A domains and also observed that conventional approaches are inadequate to understand or predict the specificity of fungal A domains [5].
The clades formed by the C domains showed higher divergence than the A domains with the C T domains forming their own branch and C domain from modules 1 and 3 clearly distinct to those from modules 2, 4, 5, and 6 (Figure 6).This separation of the non-terminal C domains could be due to modules 1 and 3 lacking an E domain.Again, the domains from UngA and UngA'' were more closely related than those from UngA' regardless of which two amino acids were condensed.

Conclusion
In this study unguisins B and J were isolated from A. heteromorphus CBS 117.55 which has not been extensively investigated for secondary metabolite production.A BGC encoding the unguisins was identified by genome mining with high homology to ung BGCs from other Aspergillus sp.Phylogenetic analysis of the A and C domains extracted from the UngA NRPS indicates that domains within modules are more closely related -even when substrate specificity differs -than domains within other modules that accept the same substrates.Bus Module, CTO-20A column oven, DGU-20A Degassing Unit and SIL-20A AutoSampler) coupled to a Shimadzu SPD-20A UV-vis Detector system using a RP-18 column (Shimadzu, Premier 250 × 10 mm i.d., 5 µm, flow rate of 3.0 mL min −1 ).High-resolution mass spectra were recorded on an ABSciex TripleTOF 6600+ mass spectrometer.Direct infusion of compounds 1 and 2 through the high-resolution mass spectrometry (HRMS) was performed using a flow rate of 10 μL min −1 which the samples were diluted at 10 ppm with a solution of MeCN/H 2 O (50:50; v/v) containing 0.1% formic acid.The parameters such as declustering and entrance potentials remained constant for MS and MS/MS were set up at 150 V and 10 V, respectively.Collision energy for MS and MS 2 scan surveys was 10 V and 45 V, respectively, with a collision energy spread of 12 V for MS 2 scan survey.Precursor ion was impacted with three different collision energies (33, 45, 57 V), and the resulting MS 2 spectra were combined into one final MS 2 spectrum.The mass spectra were acquired using Turbo Spray Ionization set to 5.5 kV in positive ion mode with an accumulation time of 100 ms.The mass ranges for MS and MS 2 scan surveys were 500-800 amu and 30-800 amu, respectively.The curtain gas (nitrogen), nebulizing and heating gas were fixed at 25 psi, 20 psi and 15 psi, respectively.The temperature of the source was 25 °C.MS spectra were acquired and processed using Analyst TF 1.8.1 software.

Fungal growth and extraction
A. heteromorphus CBS 117.55 was cultivated in 2 Erlenmeyer flasks (500 mL), each containing 90 g of rice and 150 mL of H 2 O [16].The medium was autoclaved at 121 °C for 20 min.After sterilization, the medium was inoculated with the spore solution of A. heteromorphus (1 mL) and incubated in static mode at 25 °C for 21 days.The following day, the cultured mass in the flasks was ground and extracted with ethyl acetate (EtOAc, 3 × 100 mL).The EtOAc fraction was dried using a rotary evaporator and then dissolved in CH 3 CN for defatting with hexane by partitioning.The CH 3 CN fraction was evaporated, yielding 0.601 g of soluble-organic extract.

Fractionation and isolation of unguisins J and B
The soluble-organic extract was fractionated by preparative HPLC-PDA using Kinetex RP18 column (250 mm × 30 mm i.d., 5 μm) and UV detector at λ max = 254 nm.The mobile

Figure 4 :
Figure 4: Clinker analysis of identified unguisin-encoding BGCs.UngE' is a methyltransferase that methylates phenylalanine and appears only in the A. campestris BGC.

Figure 5 :
Figure 5: Phylogenetic analysis of A domains extracted from UngA NRPS.The substrate of the A domain is indicated for each clade.

Figure 6 :
Figure 6: Phylogenetic analysis of C domains extracted from UngA NRPS.The substrates condensed by each C domain is indicated.