How to interpret the probabilities (p0, p1) of the result of h2o.predict()

Question

I would like to understand the meaning of the value (result) of h2o.predict() function from H2o R-package. I realized that in some cases when the predict column is 1, the p1 column has a lower value than the column p0. My interpretation of p0 and p1 columns refer to the probabilities for each event, so I expected when predict=1 the probability of p1 should be higher than the probability of the opposite event (p0), but it doesn't occur always as I can show in the following example: using prostate dataset.

Here is executable example:

library(h2o)
h2o.init(max_mem_size = "12g", nthreads = -1)
prostate.hex <- h2o.importFile("https://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv")
prostate.hex$CAPSULE  <- as.factor(prostate.hex$CAPSULE)
prostate.hex$RACE     <- as.factor(prostate.hex$RACE)
prostate.hex$DCAPS    <- as.factor(prostate.hex$DCAPS)
prostate.hex$DPROS    <- as.factor(prostate.hex$DPROS)

prostate.hex.split = h2o.splitFrame(data = prostate.hex,
  ratios = c(0.70, 0.20, 0.10), seed = 1234)
train.hex     <- prostate.hex.split[[1]]
validate.hex  <- prostate.hex.split[[2]]
test.hex      <- prostate.hex.split[[3]]

fit <- h2o.glm(y = "CAPSULE", x = c("AGE", "RACE", "PSA", "DCAPS"),
  training_frame = train.hex,
  validation_frame = validate.hex,
  family = "binomial", nfolds = 0, alpha = 0.5)

prostate.predict = h2o.predict(object = fit, newdata = test.hex)
result <- as.data.frame(prostate.predict)
subset(result, predict == 1 & p1 < 0.4)

I get the following output for the result of the subset function:

   predict        p0        p1
11       1 0.6355974 0.3644026
17       1 0.6153021 0.3846979
23       1 0.6289063 0.3710937
25       1 0.6007919 0.3992081
31       1 0.6239587 0.3760413

For all the above observations from test.hex dataset the prediction is 1, but p0 > p1.

The total observation where predict=1 but p1 < p0 is:

>   nrow(subset(result, predict == 1 & p1 < p0))
[1] 14

On contrary there are no predict=0 where p0 < p1

>   nrow(subset(result, predict == 0 & p0 < p1))
[1] 0

Here is the table for table information for predict:

> table(result$predict)

 0  1 
18 23

We are using as a decision variable CAPSULE with the following values:

> levels(as.data.frame(prostate.hex)$CAPSULE)
[1] "0" "1"

Any suggestion?

Note: The question with a similar topic: How to interpret results of h2o.predict does not address this specific issue.

Darren Cook · Answer 1 · 2018-09-13T18:03:25.473

What you are describing is a threshold of 0.5. In fact a different threshold will be used, one that maximizes a certain metric. The default metric is F1 (*); if you print the model information you can find the thresholds used for each metric.

See the question: How to understand the metrics of H2OModelMetrics Object through h2o.performance? for more on this (your question was different, which was why I didn't mark it as a duplicate).

As far as I know you cannot change the F1 default to either h2o.predict() or h2o.performance(). But instead you can use h2o.confusionMatrix()

Given your model fit, and to use max F2 instead:

h2o.confusionMatrix(fit, metrics = "f2")

You can also just use the h2o.predict() "p0" column directly, with your own threshold, instead of the "predict" column. (That is what I have done, before.)

*: The definition is here: https://github.com/h2oai/h2o-3/blob/fdde85e41bad5f31b6b841b300ce23cfb2d8c0b0/h2o-core/src/main/java/hex/AUC2.java#L34 Further down that file also shows how each of the metrics is calculated.

based on the @SandipanDey response it seems the be the metric: `max f1` and not the `max f0point5` for this case. There is no so much documentation for `h2o.predict()`. Can you provide some hint or link about to specify for prediction function a different performance metric? I have an imbalance class, so rather than `max f1` I guess it would be more convenient to use `max f2` that penalizes more the FN. I guess If have to specify some parameter using via `...` input argument. — David Leal, Sep 13 '18 at 15:01
@DavidLeal I updated my answer (re. F1) before your comment, but you probably already had it open and didn't see. Use `h2o.confusionMatrix` to specify your own threshold. — Darren Cook, Sep 13 '18 at 17:59

Sandipan Dey · Accepted Answer · 2018-09-13T20:15:38.527

It seems (also see here) that the threshold that maximizes the F1 score on the validation dataset is used as the default threshold for classification with h2o.glm(). We can observe the following:

the threshold value that maximizes F1 score on the validation dataset is 0.363477.
all datapoints with predicted p1 probability less than this threshold value are classified as 0 class (a datapoint predicted to be a 0 class has the highest p1 probability = 0.3602365 < 0.363477).

all datapoints with predicted p1 probability greater than this threshold value are classified as 1 class (a datapoint predicted to be a 1 class has the lowest p1 probability = 0.3644026 > 0.363477).

min(result[result$predict==1,]$p1)
# [1] 0.3644026
max(result[result$predict==0,]$p1)
# [1] 0.3602365

# Thresholds found by maximizing the metrics on the training dataset
fit@model$training_metrics@metrics$max_criteria_and_metric_scores 
#Maximum Metrics: Maximum metrics at their respective thresholds
#                        metric threshold    value idx
#1                       max f1  0.314699 0.641975 200
#2                       max f2  0.215203 0.795148 262
#3                 max f0point5  0.451965 0.669856  74
#4                 max accuracy  0.451965 0.707581  74
#5                max precision  0.998285 1.000000   0
#6                   max recall  0.215203 1.000000 262
#7              max specificity  0.998285 1.000000   0
#8             max absolute_mcc  0.451965 0.395147  74
#9   max min_per_class_accuracy  0.360174 0.652542 127
#10 max mean_per_class_accuracy  0.391279 0.683269  97

# Thresholds found by maximizing the metrics on the validation dataset
fit@model$validation_metrics@metrics$max_criteria_and_metric_scores 
#Maximum Metrics: Maximum metrics at their respective thresholds
#                        metric threshold    value idx
#1                       max f1  0.363477 0.607143  33
#2                       max f2  0.292342 0.785714  51
#3                 max f0point5  0.643382 0.725806   9
#4                 max accuracy  0.643382 0.774194   9
#5                max precision  0.985308 1.000000   0
#6                   max recall  0.292342 1.000000  51
#7              max specificity  0.985308 1.000000   0
#8             max absolute_mcc  0.643382 0.499659   9
#9   max min_per_class_accuracy  0.379602 0.650000  28
#10 max mean_per_class_accuracy  0.618286 0.702273  11

result[order(result$predict),]
#   predict          p0        p1
#5        0 0.703274569 0.2967254
#6        0 0.639763460 0.3602365
#13       0 0.689557497 0.3104425
#14       0 0.656764541 0.3432355
#15       0 0.696248328 0.3037517
#16       0 0.707069611 0.2929304
#18       0 0.692137408 0.3078626
#19       0 0.701482762 0.2985172
#20       0 0.705973644 0.2940264
#21       0 0.701156961 0.2988430
#22       0 0.671778898 0.3282211
#24       0 0.646735016 0.3532650
#26       0 0.646582708 0.3534173
#27       0 0.690402957 0.3095970
#32       0 0.649945017 0.3500550
#37       0 0.804937468 0.1950625
#40       0 0.717706731 0.2822933
#41       0 0.642094040 0.3579060
#1        1 0.364577068 0.6354229
#2        1 0.503432724 0.4965673
#3        1 0.406771233 0.5932288
#4        1 0.551801718 0.4481983
#7        1 0.339600779 0.6603992
#8        1 0.002978593 0.9970214
#9        1 0.378034417 0.6219656
#10       1 0.596298925 0.4037011
#11       1 0.635597359 0.3644026
#12       1 0.552662241 0.4473378
#17       1 0.615302107 0.3846979
#23       1 0.628906297 0.3710937
#25       1 0.600791894 0.3992081
#28       1 0.216571552 0.7834284
#29       1 0.559174924 0.4408251
#30       1 0.489514642 0.5104854
#31       1 0.623958696 0.3760413
#33       1 0.504691497 0.4953085
#34       1 0.582509462 0.4174905
#35       1 0.504136056 0.4958639
#36       1 0.463076505 0.5369235
#38       1 0.510908093 0.4890919
#39       1 0.469376828 0.5306232

This answer and the one provided by @DarrenCook are a valid response to my question. I will mark this solution as the best one because it gives also a numerical explanation based on the specific problem I posted. Both are really very valuable responses. — David Leal, Sep 18 '18 at 12:45

How to interpret the probabilities (p0, p1) of the result of h2o.predict()

2 Answers2

Linked