What is a Confusion Matrix?

Introduction to Quantum Computing

For "classification", a confusion matrix is developed.  IT is a better way to evaluate the performance of a classifier.  

It is displayed in a 2 by 2 table layout.  It permits the visualization of a supervised learning algorithm's performance.  The row in the table represents the instances in an actual class while the column represents the instances of a predicted class. 

So, in the table, the number of correct and incorrect predictions are highlighted with their counted values with each broken down by its actual or predicted class.  Thus, the confusion matrix displays the ways the classification algorithm was confused when it made its predictions.  So, the confusion matrix provides insights into the errors as well as the types of errors that are made.

There will be four outcomes when the actual classification is compared to the predicted classification. 

  • true positive: the actual classification is positive & the predicted classification is positive
  • false negative: the actual classification is positive & the predicted classification is negative
  • false positive: the actual classification is negative & the predicted classification is positive
  • true negative: the actual classification is negative & the predicted classification is negative

It is important to evaluate the performance of the classifier.  That is where the confusion matrix comes in.  The measures ... 

  • accuracy: correct predictions; (remember: if 99% of individuals are healthy and you predicted 'healthy', you will be right 99% of the time - but while you are accurate, it is not very helpful); number of true positives and the number of true negatives  divided by the total population; 
    • ∑ True-Positives + ∑ True-Negatives) / Total-Population
  • prevalence: how often it occurs in the population 
  • precision: the accuracy of the positive predictions; 
    • ∑ True-Positives / ∑ All-Predicted-Positives
  • recall: the accuracy of the actual positives;
    • ∑ True-Positives / ∑ All-Actual-Positives
  • specificity: the accuracy of the actual negatives;
    • ∑ True-Negatives / ∑ All-Actual-Negatives
  • negative predictive value: the accuracy of the negative predictions;
    • ∑ True-Negatives / ∑ All-Predicted-Negatives

This allows more detailed analysis than simply observing the proportion of correct classifications (accuracy). Accuracy will yield misleading results if the data set is unbalanced; that is, when the numbers of observations in different classes vary greatly.

References: