This tutorial explains how to compute confusion matrix in Excel, along with examples.
Please ensure you have the following two columns in Excel:
- Binary Dependent Variable (0 / 1)
- Predicted probabilities
Let's say you have data for dependent variable in cells A2:A100 and probabilities in cells B2:B100.
The confusion matrix is shown in the table below for your reference.
Predicted Negative(0) | Predicted Positive(1) | |
---|---|---|
Actual Negative(0) | True Negative (TN) | False Positive (FP) |
Actual Positive(1) | False Negative (FN) | True Positive (TP) |
Step 1 : Classify Predictions
The first step is to create a column to classify predictions. Let's say probability cutoff is 0.5 which means assigning 1 if the predicted probability is 0.5 or higher, otherwise 0. Enter the following formula in cell C2.
=IF(B2 >= 0.5, 1, 0)
Drag the formula down to the last row (i.e C100) to cover all rows.
Step 2 : Count Positives and Negatives
Enter the following formula in cell D2.
=COUNTIFS(A2:A100, 1, C2:C100, 1)
Enter the following formula in cell E2.
=COUNTIFS(A2:A100,0, C2:C100, 0)
Enter the following formula in cell F2.
=COUNTIFS(A2:A100,1, C2:C100, 0)
Enter the following formula in cell G2.
=COUNTIFS(A2:A100,0, C2:C100, 1)
Step 3 : Calculate Confusion Matrix
The True Positive Rate is also referred to as Recall in the confusion matrix. Enter the following formula in cell H2.
=D2 / (D2 + F2)
Enter the following formula in cell I2.
=G2 / (G2 + E2)
Enter the following formula in cell J2.
=E2/(E2+G2)
Enter the following formula in cell K2.
=F2 / (F2 + D2)
Enter the following formula in cell L2.
=(D2+E2)/COUNT(C2:C100)
Enter the following formula in cell M2.
=D2/(D2+G2)
F1 score is the harmonic mean of precision and recall. It performs well on a dataset having low event rate.
=2 * (H2 * M2) / (H2 + M2)
The following VBA code is a user-defined function that automates all the steps shown above to compute confusion matrix.
Function ConfusionMatrix(predictions As Range, actuals As Range, threshold As Double) As Variant
Dim n As Long
Dim tpr As Double, fpr As Double, tnr As Double, fnr As Double, accuracy As Double
Dim sorted_indices() As Long
Dim i As Long
' Check if the predictions and actuals ranges have the same number of elements
If predictions.Count <> actuals.Count Then
MsgBox "Predictions and actuals must have the same number of elements."
Exit Function
End If
n = predictions.Count
ReDim sorted_indices(1 To n)
' Sort predictions and actuals in descending order of predictions
sorted_indices = SortIndicesDescending(predictions)
Dim num_pos As Long, num_neg As Long
num_pos = WorksheetFunction.CountIf(actuals, 1)
num_neg = n - num_pos
If num_pos = 0 Or num_neg = 0 Then
MsgBox "There must be both positive and negative actual values."
Exit Function
End If
Dim tp_count As Long, fp_count As Long, tn_count As Long, fn_count As Long
tp_count = 0
fp_count = 0
tn_count = 0
fn_count = 0
' Calculate TP, FP, TN, FN for the specified threshold
For i = 1 To n
If predictions.Cells(sorted_indices(i)).Value >= threshold Then
If actuals.Cells(sorted_indices(i)).Value = 1 Then
tp_count = tp_count + 1
Else
fp_count = fp_count + 1
End If
Else
If actuals.Cells(sorted_indices(i)).Value = 1 Then
fn_count = fn_count + 1
Else
tn_count = tn_count + 1
End If
End If
Next i
tpr = Round(tp_count / num_pos, 4)
fpr = Round(fp_count / num_neg, 4)
fnr = Round(fn_count / num_pos, 4)
tnr = Round(tn_count / num_neg, 4)
accuracy = Round((tp_count + tn_count) / n, 4)
' Display TPR, FPR, TNR, and FNR in one message box
ConfusionMatrix = "Accuracy: " & accuracy & ", " & _
"True Positive Rate: " & tpr & ", " & _
"False Positive Rate: " & fpr & ", " & _
"True Negative Rate: " & tnr & ", " & _
"False Negative Rate: " & fnr
End Function
- In Excel, open the VBA editor by pressing Alt + F11 keyboard shortcut key.
- Select Insert > Module to create a module.
- Paste the above VBA code in the module.
- Close the VBA editor by clicking the 'X' in the top-right corner.
- In any cell, enter this formula :
=ConfusionMatrix(B2:B100,A2:A100,0.5)
It follows this syntax -=ConfusionMatrix(probability_range,dependent_variable_range,cutoff)
How to Create ROC Curve in Excel
Excel Template : Gain and Lift Charts
Free Excel Add-In for Logistic Regression