h2o vs scikit learn confusion matrix

Question

Anyone able to match the sklearn confusion matrix to h2o?

They never match....

Doing something similar with Keras produces a perfect match.

But in h2o they are always off. Tried it every which way...

Borrowed some code from: Any difference between H2O and Scikit-Learn metrics scoring?

# In[30]:
import pandas as pd
import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
h2o.init()

# Import a sample binary outcome train/test set into H2O
train = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
test = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")

# Identify predictors and response
x = train.columns
y = "response"
x.remove(y)

# For binary classification, response should be a factor
train[y] = train[y].asfactor()
test[y] = test[y].asfactor()

# Train and cross-validate a GBM
model = H2OGradientBoostingEstimator(distribution="bernoulli", seed=1)
model.train(x=x, y=y, training_frame=train)

# In[31]:
# Test AUC
model.model_performance(test).auc()
# 0.7817203808052897

# In[32]:

# Generate predictions on a test set
pred = model.predict(test)

# In[33]:

from sklearn.metrics import roc_auc_score, confusion_matrix

pred_df = pred.as_data_frame()
y_true = test[y].as_data_frame()

roc_auc_score(y_true, pred_df['p1'].tolist())
#pred_df.head()

# In[36]:

y_true = test[y].as_data_frame().values
cm = pd.DataFrame(confusion_matrix(y_true, pred_df['predict'].values))

# In[37]:

print(cm)
    0     1
0  1354   961
1   540  2145

# In[38]:
model.model_performance(test).confusion_matrix()

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.353664307031828: 

    0         1     Error   Rate
0   964.0   1351.0  0.5836  (1351.0/2315.0)
1   274.0   2411.0  0.102   (274.0/2685.0)
Total   1238.0  3762.0  0.325   (1625.0/5000.0)

# In[39]:
h2o.cluster().shutdown()

The values you passed in scikit-learn confusion matrix are based on different threshold (training threshold for best f1). But the `model_performance(test).confusion_matrix()` uses threshold `0.35366..` and hence the results are different. — Vivek Kumar, Aug 16 '18 at 07:33
Hi @VivekKumar, I did almost the same with your advice but still didn't get the same results. Please have a look at my answer below and check if I made some mistakes. — Anastasiya-Romanova 秀, Jul 20 '20 at 13:47

score 0 · Answer 1 · answered Aug 16 '18 at 19:26

0

This does the trick, thx for the hunch Vivek. Still not an exact match but extremely close.

perf = model.model_performance(train)
threshold = perf.find_threshold_by_max_metric('f1')
model.model_performance(test).confusion_matrix(thresholds=threshold)

answered Aug 16 '18 at 19:26

QuanTomatic

51
5

Yes. Thats why I did not post that as answer. Because with the training threshold, I was able to reach close but not exact same. I think you should post this on [the H2O issues here](https://github.com/h2o/h2o/issues), so that you can get confirmed answers from developers. – Vivek Kumar Aug 17 '18 at 08:00

score 0 · Answer 2 · answered Jul 20 '20 at 13:42

I also meet the same issue. Here is what I would do to make a fair comparison:

model.train(x=x, y=y, training_frame=train, validation_frame=test)
cm1 = model.confusion_matrix(metrics=['F1'], valid=True)

Since we train the model using training data and validation data, then the pred['predict'] will use the threshold which maximizes the F1 score of validation data. To make sure, one can use these lines:

threshold = perf.find_threshold_by_max_metric(metric='F1', valid=True)
pred_df['predict'] = pred_df['p1'].apply(lambda x: 0 if x < threshold else 1)

To get another confusion matrix from scikit learn:

from sklearn.metrics import confusion_matrix

cm2 = confusion_matrix(y_true, pred_df['predict'])

In my case, I don't understand why I get slightly different results. Something like, for example:

print(cm1)
>> [[3063  176]
    [  94  146]]

print(cm2)
>> [[3063  176]
    [  95  145]]

Maybe there's a rounding happening here. Please print the model threshold by using `print(model)` and compare it with your threshold found by `perf.find_threshold_by_max_metric` — Vivek Kumar, Jul 21 '20 at 03:20
Also as you can see on the other answer discussion, even we are not able to get exact same results. So maybe posting it to `H2O github issues` may help — Vivek Kumar, Jul 21 '20 at 03:22

h2o vs scikit learn confusion matrix

2 Answers2

Linked