How to Measure the Performance of Automated Structure Verification Systems

The generation of positive and negative controls is a fundamental part of good experimental design. Getting a positive outcome on a test performed over a subject known to give a positive result, reassures the scientist the test is working properly. As important, if not more, is to test over subjects known to give negative results. Getting a negative outcome when expected validates the test and increases the result’s confidence when applied to unknowns.

Automated Structure Verification (ASV) is no different than any other scientific test. Positive as well as negative controls should be frequently tested to optimize performance and obtain a measure of robustness and confidence in the results.

This post approaches the problem of comparing the performance of different ASV systems (or different parameterizations of the same system) and shows that it is possible to use metrics taken from the field of statistical classification to calculate the performance of different ASV implementations. Furthermore, it is shown that great care has to be taken in the design of negative control structures because they can dramatically influence the performance outcome.

Automatic Generation of Negative Control Structures

In order to design the negative control structures in a consistent and objective way a molecular similarity coefficient was developed. The method, termed MolSimNMR, takes two chemical structures as input and outputs the expected NMR data similarity value between them. It was shown before that this value is predictive of the similarity between the NMR data of input structures.


Method Validation

The method was validated against a test set of 100 commercial compounds. Their pairwise Molecular Similarity between all pairs (4950 pairs total) was calculated by the MolSimNMR method and compared to the spectral similarity of the predicted NMR data. 1H-1D and 13C-1D data was predicted and then the 1H-13C HSQC data was constructed. The NMR data similarity between all pairs of structures was calculated by a 1D and 2D binning technique.

Plot of calculated versus measured spectral similarity for 1H-13C HSQC data
ASV as Binary Classifier

An ASV system could be thought of as a binary classifier. A system that discriminates members of a given set between two classes.  In this case, NMR data and the proposed chemical structure would be either consistent or not consistent with each other.

One of the main advantages of treating ASV as a binary classifier is that we can use a host of metrics that have already been in use for many years and have been validated in many fields.

Performance Metrics for Binary Classifiers

The performance of a binary classifier is measured in the learning phase by using a test set of positive and negative examples. Then, each result is labeled as:


True Positive (TP)           Positive example predicted Positive

False Negative (FN)         Positive example predicted Negative

False Positive (FP)         Negative example predicted Positive

True Negative (TN)          Negative example predicted Negative


With these definitions in hand we can compute many different metrics as follows:


True Positive Rate (TPR)                         TP/(TP+FN)

False Positive Rate (FPR)                        FP/(FP+TN)

Accuracy (ACC)                                            (TP+TN)/(TP+FN+FP+TN)

Positive Predicted Value (PPV)            TP/(TP+FP)

Negative Predicted Value (NPV)          TN/(TN+FN)


ROC Curve

The Receiver Operating Characteristic (ROC) curve is a graph of the True Positive Rate (TPR) versus the False Positive Rate as a function of the operating parameters. It is usually constructed by keeping a set of parameters constant and sweeping the threshold range. For each threshold value, a new set of TPR and FPR values are calculated using the formulas stated above.

Design of Negative Control Structures

The design of the negative control structures is a key element for getting reliable performance  metrics.

The ROC curves below prove that the use of an “easy” set of NCS results in an overestimation of ASV performance  (red line, left). Using a more challenging NCS set results in a lower performance curve (green line, left).

The ROC below shows the  effect on ASV performance of using increasingly more challenging NCS sets (from red  to purple).


The performance of ASV systems can be measured by a host of metrics already in use in the field of statistical classification, signal detection, and many others.

The ROC Curve is a competent visual aid to parameterization and threshold selection.

Negative control structures should be properly designed by calibrating their similarity to the positive control structures using MolSimNMR molecular similarity coefficient.