Hyperglycemia causes diabetic nephropathy, a condition for which there are no specific diagnostic markers that predict progression to renal failure. Here we describe a multiplatform metabolomic analysis of urine from individuals with type 2 diabetes, collected before and immediately following experimental hyperglycemia. We used targeted nuclear magnetic resonance spectroscopy (NMR), liquid chromatography - mass spectrometry (LC-MS) and gas chromatography - MS (GC-MS) to identify markers of hyperglycemia. Following optimization of data normalisation and statistical analysis, we identified a reproducible NMR and LC-MS based urine signature of hyperglycemia. Significant increases of alanine, alloisoleucine, isoleucine, leucine, N-isovaleroylglycine, valine, choline, lactate and taurine and decreases of arginine, gamma-aminobutyric acid, hippurate, suberate and N-acetylglutamate were observed. GC-MS analysis identified a number of metabolites differentially present in post-glucose versus baseline urine, but these could not be identified using current metabolite libraries. This analysis is an important first step towards identifying biomarkers of early-stage diabetic nephropathy.
Keywords: Metabolomics; Urine; Hyperglycemia; Hyperglycaemia; Diabetes; NMR; GC; LC; MS; Biomarker
Diabetic nephropathy is caused by hyperglycemia and is the leading cause of end-stage renal failure in developed countries . Albuminuria is a sensitive but non-specific marker of diabetic nephropathy [2- 6]. A better understanding of the effects of hyperglycemia on urine composition could lead to an improved test for this disease.
Metabolite profiling or metabolomics of biofluids such as serum and urine is increasingly used to identify biomarkers for a wide variety of diseases [7,8]. The urine metabolome comprises hundreds of small molecules filtered from the blood or produced by the kidney . Many of these can be accurately measured using nuclear magnetic resonance spectroscopy (NMR) or chromatographic separation (gas chromatography, GC or liquid chromatography, LC) in combination with mass spectrometry (MS) (for review see ). In humans and animals, diabetes is associated with distinct NMR [10-13] and MS  urine profiles, although consensus between studies is lacking. Inconsistencies probably reflect differences in methodologies and analytical platforms, inter-species differences  and, for human studies, differences in donor populations. A major challenge for urinary metabolite analysis is raw data normalisation [15-18], which is not always performed and may seriously affect the final outcome. Another issue is type 1 error: false positives are likely prevalent in reported literature as no published studies of urine metabolomics have been validated by performing repeat experiments.
In this report, we describe the NMR, GC-MS and LC-MS targeted profiles of urine from people with type 2 diabetes, collected before and immediately following an intravenous glucose challenge in two repeated, independent experiments (2010 and 2011). We highlight the importance of normalisation and statistical analysis of multi-platform data to identify a profile for hyperglycemia, and discuss reproducibility between independent experiments. These methods are an important first step towards identifying a urine metabolomic profile of early-stage diabetic nephropathy.
Mid-stream urine was collected from overweight (body-mass index 25-30) people with type 2 diabetes of less than 5 years duration immediately before and 20-30 min after intravenous administration of a bolus of 50 ml 50% w/v glucose. All subjects had normal serum creatinine concentration, and 4 out of 46 had an elevated urinary albumin/creatinine ratio, indicative of microalbuminuria. Baseline venous glucose ranged from 4.2-14.4 mmol/l and peaked between 17.3 and 24.5 mmol/l within 6 minutes of glucose infusion. Samples were stored on ice for up to 1 hour and then frozen at -70°C until analysed. This study was approved by the Monash University Human Research Ethics Committee detailed patient information is available in table S0.
Two sample sets were analysed, containing paired urine samples collected in either 2010 or 2011. Frozen urine was thawed at room temperature for 30 min and aliquoted for GC-MS, LC-MS and NMR sample preparations. Pooled biological quality control samples were obtained by taking an equal volume of all samples.
Metabolite extraction and data acquisition
Detailed descriptions of metabolite extraction procedures and conditions of data acquisition on NMR, GC-MS and LC-MS of both the 2010 and 2011 sample set are presented in Appendix 1.
Data processing and statistical analysis
For all platforms, pooled biological controls were inspected, validating technical consistency. Total ion chromatograms and corresponding mass spectra obtained by GC-MS were evaluated using the Chemstation program (Agilent Technologies, Santa Clara, USA) and deconvoluted using AMDIS (NIST, www.chemdata.nist.gov). Identification of metabolites was based on comparison with in-house libraries containing retention time and mass spectra (Metabolomics Australia). All matching mass spectra were additionally verified by determination of the retention time by analysis of authentic standard substances. Unknown compounds, nevertheless still recognized by their specific retention times and mass spectra (Mass Spectral Tags, MSTs ) but not useful for strict targeted analysis, were retained in one dataset (2011) in order to estimate their importance. Targeted data matrices were built using AnalyzerPro (SpectralWorks Ltd, Runcorn, UK) and all missing values were manually verified and replaced when not detected by the algorithm. Identifications were further verified by reanalysing the data using the same library but a different algorithm (PyMS, www.code.google.com/p/pyms) . Final GC-MS data matrices contained 49 (2010) or 34 (2011) known compounds in addition to 43 unknowns (2011).
NMR data was processed using Chenomx NMR Suite (Chenomx Inc., Edmonton, Alberta, Canada). All spectra were normalised to the standardised area of the DSS signal. For 2010 spectra, the 0.00-10.00 spectral region of all samples was binned using 0.04 ppm bin width and omitting the water-urea region (4.50-6.00 ppm), after which unsupervised statistical analysis (principle components analysis, hierarchical clustering analysis) was applied to select interesting spectral regions (SIMCA-P, Umetrics, Umeå, Sweden and R, R Development Core Team). A list of 53 compounds of interest was compiled based on this information. This list was also used to generate the 2011 targeted data matrix. Metabolites were identified and quantified using the Chenomx 5.1 NMR Suite Profiler module and the 800 MHz/600 MHz compound libraries for samples in the pH range of 6-8, using a metabolite quantification algorithm such as that described by Dreier and Wider .
LC-MS chromatograms and mass spectra were evaluated using the Mass Hunter Quantitative Analysis Program (Agilent Technologies, Santa Clara, USA). Quantification of amines was achieved using an external calibration curve method with an internal standard, 2-aminobutyric acid (25 μM), for instrument/analyst error correction. Response ratios were calculated by dividing the area of each analyte by the area of the internal standard, then concentrations were determined using the calibration curve. Data for 26 biological amines was present in the 2010 data matrix, 35 could be quantified in the 2011 experiment.
Whereas we only targeted amines via LC-MS, NMR and GC-MS data also contained other polar metabolites such as organic acids and sugars (apart from glucose, GC-MS only). All data matrices were subsequently analysed using an in-house R based statistical package  as well as The Unscrambler X (Camo Software, Oslo, Norway). Raw data matrices are often right-skewed covering a wide range of measured concentrations and usually turn out to be unsuited for most standard statistical analyses . Hence, data pre-processing and pre-treatment tools are important for converting the data matrices to more appropriate formats [27,28]. In order to make each sample comparable to each other, sample-wise normalization was performed using either the sample median or the corresponding creatinine values depending on the appropriateness of the normalization method for the corresponding data matrix. Where possible, missing values were checked manually and confirmed from the raw data files as genuinely missing from the samples. Metabolites which were found missing for more than 75% of the samples in both pre and post groups and which were found in substantially low amounts in the rest of the samples in both groups were excluded from further analysis. Missing values which were found to be below the instrumental detection limits were replaced by half of the minimum value of the entire data matrix (e.g. ). Remaining missing values were treated as missing for those statistical methods which are able to accommodate missing values (such as t-tests), but were replaced by means of nearest neighbours [29,30] for other statistical methods which require a complete data matrix, such as principle component analysis. For each dataset, the suitability of an appropriate transformation (e.g., log transformation, squareroot transformation, and other Box-Cox transformations [27,31]) was explored to achieve normality, remove heteroscedasticity, and to change the scale of the data for statistical analyses. For all data matrices log transformation was found to be the most appropriate. Potential outliers were identified and checked based on the Z-score method  in a metabolite-wise manner for each group, as well as from PCA plots. Paired outliers – present in pre-glucose and post-glucose treatment urine samples collected from the same patient – were kept in the dataset, whereas a very few unpaired outliers which were likely caused due to analytical or operational errors were replaced with a missing value. For each metabolite, a paired sample t test was used to test the null hypothesis that the mean difference between post-glucose and baseline samples equals zero. For every metabolite and each platform (GC-MS, LC-MS, NMR) and experiment (2010 or 2011), a number of significant metabolites were then identified. To be consistent with biological literature, we employed a significance level of 0.05 for the t tests, and used the conservative Bonferroni p-value method to adjust for multiple comparisons . Normality was assessed by means of Shapiro-Wilk and Anderson-Darling normality tests, and by manual inspection of metabolite boxplots. In addition to t tests, metabolites failing normality were re-evaluated using a non-parametric approach (Wilcoxon signed-rank test). We did not observe any inconsistencies between parametric and non-parametric results. Average fold changes were calculated as the difference (post minus pre) of the group averages of log2 transformed data (e.g. ).
Two groups of urine samples, collected in 2010 or 2011 from people with type 2 diabetes, were subjected to cross-platform metabolomic analysis. This report focuses on the choice of normalisation method, and the value of cross-platform analysis and replicate experiments to identify reproducible urine biomarkers of acute hyperglycemia.
Glycosuria does not confound metabolite identification
High glucose concentrations can cause non-enzymatic modification of amine-containing compounds , and glycosuria could possibly lead to changes in other metabolites during sample processing and analysis (e.g., react with other compounds during solvent extraction, injection, chromatography, ionisation, fragmentation or detection). This could result in false biomarker assignment. To investigate the possibility of artificial biomarkers, we mimicked the biochemical effect of glycosuria by adding 150 mM pure glucose to baseline samples. These were incubated for 1 hour and then extracted in an identical manner to that of the other samples. All of the spectral signals generated by spiking urine with glucose were also identified in a glucose blank (150 mM glucose in water). This indicates that no spurious metabolites were generated by high urinary glucose levels. We believe that this is an important control, as metabolite extraction procedures vary and the possibility of false positives originating in high glucose levels cannot otherwise be ruled out.
Median data normalisation is optimal
Urine, in contrast to most other biofluids, is not well buffered  and the concentration of urinary metabolites varies in response to variations in serum osmolarity. Despite this, it is still common practice to normalise against urinary creatinine, a readily detectable metabolite [15-18]. However, creatinine levels may be variable in diabetic patients and can be affected by hyperglycemia [11,12,36,37]. Other methods such as normalising to total spectral area or urine osmolarity have been described , but we could not use these because both spectral area and osmolarity changed substantially following glucose infusion. Alternatively, a part of the spectral area could be chosen as reliable for normalisation, but this choice would be subjective and prone to operator and spectral processing errors.
We hypothesised that median normalisation, which generally correlates with overall sample abundance , would be the best method to use. To confirm this, we compared median and creatinine normalisation methods for data generated by NMR (Table 1, Figure 1 and Figure S1).
|median normalisation||creatinine normalisation|
|metabolite||nr.||Bonferroni p||log2FC||error||Bonferroni p||log2FC||error||Bonferroni p||log2FC||error||Bonferroni p||log2FC||error|
Table 1: Metabolites detected by NMR that changed following glucose infusion, normalised to either the sample median or corresponding creatinine concentration. Only metabolites with a p-value below 0.05 after Bonferroni correction (Bonferroni p) were accepted and no restriction was applied with regard to fold change (log2FC = log2 fold change, with baseline measurements as reference). Empty spaces represent detected, but non-(significantly) changing metabolites. *=inconsistent result for this metabolite.
Figure 1: A. Log2 fold changes of consistently differential metabolites as detected by NMR after median normalisation (black: 2010, dark grey: 2011 results) or creatinine normalisation (light grey: 2010, white: 2011 results) working at the 95% confidence level after Bonferroni correction of the p-values. B. PCA scores and loadings plot for median normalisation indicating significantly changing metabolites (2011 data, black = pre, grey = post intravenous glucose). Number identifiers refer to corresponding metabolites listed in Table 1.
For NMR data, both median and creatinine normalisation led to the selection of several biomarkers, although some were restricted to one normalisation method only (Table 1 and Figure 1A). While there are biological arguments for not using creatinine for normalisation, here, each method reproducibly identified biomarkers to a similar extent (Table 1 and Figure 1A). In addition, creatinine levels correlated well with overall sample metabolite levels (Figure S2A), supporting its use as a normalising factor in these datasets. This unexpected result probably reflects the fact that all patients in our study had normal renal function and therefore similar rates of creatinine clearance into the urine. It has to be noted however, that when the 2011 NMR dataset was normalised to the median, creatinine levels were significantly reduced after glucose infusion (Table 1). This finding has been observed previously [12,13,36,37]. Creatinine normalisation of NMR data also rendered it less suited for PCA analysis (Figures 1B and S1). We did not perform a similar comparison of normalisation method for LC-MS and GCMS data because, unlike NMR data, creatinine concentrations did not correlate well with overall sample abundance (Figure S2). Therefore, median normalisation is superior to creatinine normalisation for all data analysed in this study. There is one caveat, however. While median normalisation is generally robust, it is easily compromised when there is a large difference in the number of detected metabolites between groups. This was the case in the 2011 GC-MS dataset, where a number of mostly unknown compounds (see “Data processing and statistical analysis”) were not detected in baseline samples, but clearly present following hyperglycemia. Such ‘off-on’ compounds, which rendered the sample median too variable to use as a ‘constant’ normalising factor, were readily detected in a volcano plot because they caused marked skewing (Figure S3). Therefore, for GC-MS data, we removed off-on compounds before calculating sample median.
Biomarkers of acute hyperglycemia in type 2 diabetes
Because the groups of metabolites detected by NMR, LC-MS and GC-MS do not fully overlap , a cross-platform approach to biomarker discovery ensures the most thorough interrogation of a metabolome. In addition, the identification of the same biomarker on different platforms improves specificity.
Of all identified metabolites across the three platforms, ~16% were detected on more than one platform (18 out of 106 for 2010 data; 16 out of 101 for 2011 data).
Bona fide biomarkers are ideally applicable to a variety of patients. However, the metabolome is very sensitive to factors such as genetic background, age or gender [10,12,39]. These variations make single analyses susceptible to type 1 error and highlight the need to validate initial findings in a second experiment. To explore this, we compared data from two separate patient cohorts (2010 and 2011 datasets, Tables S1-3).
Biomarkers common to both datasets included alanine, alloisoleucine, isoleucine, leucine, N-isovaleroylglycine and valine (increased following hyperglycemia) and arginine, gammaaminobutyric acid (GABA), hippurate and suberate (reduced following hyperglycemia; Figure 2 and Table S1).
Figure 2: Selected biomarkers for acute hyperglycemia in type 2 diabetic urine (NMR and LC-MS). Potential additional markers N-acetylglutamate, choline, lactate and taurine are borderline-selected based on Bonferroni corrected p-value (see Table S1; black = 2010 NMR, light grey = 2010 LC-MS , dark grey = 2011 NMR, white = 2011 LC-MS).
The strict Bonferroni correction method  used to identify biomarkers ensured a minimum of false positive results, at the expense of rejecting a few biomarkers with one robust and one borderline p-value in each of the 2010 or 2011 experiments. We retained these initially rejected compounds (choline, lactate, taurine and N-acetylglutamate) as potential additional biomarkers, keeping in mind that their fold changes were moderate (Figure 2 and Table S1).
We also identified metabolites that were detected in only one of the two urine collections (2010 or 2011, Table S2). Because these compounds were also not replicated across platforms, their utility as biomarkers will need to be validated in future analyses.
When we analysed all data using principal components analysis, we observed no overt grouping of samples according to glucose treatment (Figure S4), although the major tendency for grouping was always in agreement with our tested hypothesis. These findings most likely reflect genetic and environmental differences between individuals.
GC-MS detected the most prominent changes in response to hyperglycemia (Figure S4), with several metabolites not detected in baseline samples and readily detected after glucose infusion. However, nearly all of these were unknown compounds, some of which had mass characteristics of sugars (Table S3). When these compounds were removed from the analysis, GC-MS data did not identify reproducible biomarkers (Table S1). It will therefore be important to determine the molecular nature of these ‘off-on’ compounds in future studies.
Levels of branched chain amino acids (BCAA’s: isoleucine, leucine and valine) and isoleucine-derived amines (alloisoleucine and N-isovaleroylglycine) were elevated following hyperglycemia. These findings are consistent with previous studies that described increased plasma BCAA levels in diabetes and pre-diabetes [41-44]. Similar changes were observed following long-term (7 days), but not shortterm (6 h), glucose treatment of endothelial cells . The increases in BCAA’s we observed may therefore reflect direct effects of glucose that increase BCAA levels either in the kidneys or at extra-renal sites, with subsequent filtration of blood-borne BCAA’s into urine.
Alanine levels were also elevated following glucose infusion, whereas taurine was decreased. Alanine, which can be produced by the transamination of glucose-derived pyruvate, is also elevated in glucose-treated endothelial cells  and hyperglycemic rat urine . Alanine is therefore most probably an immediate downstream marker of hyperglycemia. Plasma taurine levels are increased in diabetes  and urinary taurine levels correlate with liver stress, bladder cancer and hypertension [47-51]. Because taurine supplementation prevents nephropathy in diabetic rats , reduced urinary taurine following hyperglycaemia might be relevant to the pathogenesis of diabetic nephropathy.
In contrast to previous reports [12,45], hyperglycemia reduced the levels of lactate, arginine and GABA in urine. These apparent discrepancies may reflect differences between tissues and biofluids, species differences or different timescales (acute versus long-term hyperglycemia). Finally, there were several metabolites, including hippurate, choline, suberate, N-isovaleroylglycine, N-acetylglutamate, 5-hydroxytryptophan, O-acetylcarnitine, glucosamine and N-acetylglucosamine, tyramine, aconitate, galactonate, ketoglutarate, pyruvate, trehalose, fructose, sorbose, gluconate-1,4-lactone and scyllo-inositol, that were detected by only one of the platforms in only one of the 2010 or 2011 experiments. Further validation studies will be required to determine the significance of these molecules to hyperglycemia and diabetes.
We have evaluated the possibilities and limits of targeted crossplatform metabolomics analysis for urine biomarker discovery in human type 2 diabetes. Using targeted NMR and LC-MS data, we have identified a reproducible urine metabolomic signature of experimental hyperglycemia. Additional unknown compounds were found to be highly differential by GC-MS analysis. This study provides important baseline information on the metabolic changes that occur during acute hyperglycemia, which may in turn underlie the progression to diabetic nephropathy. Ongoing studies using archived urine will determine which of these markers, if any, outperform urinary albumin excretion for diagnosis of early-stage diabetic nephropathy.
The authors wish to thank Jenny Chambers, Daniel Dias, Nirupama Jayasinghe, Terra Stark and Moshe Olshansky for their help in accomplishing this work. LT, ADL, JB, JRS, DLC, DLT, MJM and UR are thankful for funding from Metabolomics Australia at The University of Melbourne, a member of Bioplatforms Australia Pty Ltd which is funded through the National Collaborative Research Infrastructure Strategy (NCRIS), 5.1 Biomolecular Platforms and Informatics and co-investments from the Victorian Government. This work was supported by a grant from the CASS Foundation (www.cassfoundation.org) and depended on Victorian State Government Operational Infrastructure Support and Australian Government NHMRC IRIIS. LT is an FWO-Flanders postdoctoral fellow. MJM is a NHMRC Principal Research Fellow.