A look at behind the scenes
In this first post we’ll take a look at what goes behind the scenes when running a typical chemometric application. We’ll be using 60 MHz NMR data of Diesel samples and model three different properties, Hydrogen content, Aromatic content, and density.
The processing starts from the raw FID data with the usual combination of appodization, linear prediction if needed, and FFT.

The raw NMR data is known as the Free Indcution Decay (FID), which is a sum of exponentially decaying sinusoidal signals.

After Fourier transform of the FID we obtain an un-phased frequency domain signal.
One critical aspect of automating a process is developing robust algorithms for each step of the process. For us, automation is not only about piling a list of generic functions that generally work, but to develop robust procedures that can produce results with a very high success rate.
This is the case of phase correction. We have developed robust algorithms that converge to the right phase correction for the great majority of the cases we have encountered. When applied to the Diesel data, the initial phase correction renders acceptable results as shown in the figures below. But, when we look at the baseline in more detail, we can see a peak to peak distortion of 0.45% to the most intense peak in the spectrum. This might not seem too much but it can be detrimental to the final results. This distortion is more evident if we zoom in closer to the baseline, as shown in the figure below.

Initial phase correction

Vertical expansion shows non-flat baseline
Baseline correction is the next step, and here we also need robust algorithms. Many algorithms have been published in the literature. Our approach is based on the idea of recognizing baseline points and then fitting a corrective function through them. This method usually gives very good results as shown in the following figures.


Final results after baseline correction and comparison with baseline after phse correction. Distortion went down from an initial 0.46% after phase correction to 0.045% after baseline correction, a 10 fold decrease.
Data is then referenced and binned, at which point is ready for applying the chemometric model and reporting the results. After data reduction, as many models as needed can be applied, which constitutes a great advantage over other methods since only one data measurement is needed.
For example, for the Diesel applications developed for this demo, three different properties, Aromatic content, Hydrogen content, and Density can be estimated from the same processed data. Before, these properties needed to be measured with three different analytical methods each with their own experimental caveats.
One final note, in all the plots I’ve used in this post, I did not use a chemical shift scale at all. This was on purpose to drive the point that for chemometric applications the chemical shift scale only serves as a bridge with the chemist inside us, but it’s not needed for any computation or data modelling at all.
But, for the sake of completeness here’s a final plot of the reduced data used to build the chemometric model, automatically processed using the method I just showed you.


In a following blog we’ll explore the process of building the chemometric model and finding out the optimal parameters through cross-validation.