0
 public BinomialModelPrediction predictBinomial(RowData data) throws PredictException {
      double[] preds = this.preamble(ModelCategory.Binomial, data);
      BinomialModelPrediction p = new BinomialModelPrediction();
      double d = preds[0];
      p.labelIndex = (int)d;
      String[] domainValues = this.m.getDomainValues(this.m.getResponseIdx());
      p.label = domainValues[p.labelIndex];
      p.classProbabilities = new double[this.m.getNumResponseClasses()];
      System.arraycopy(preds, 1, p.classProbabilities, 0, p.classProbabilities.length);
      if(this.m.calibrateClassProbabilities(preds)) {
          p.calibratedClassProbabilities = new double[this.m.getNumResponseClasses()];
          System.arraycopy(preds, 1, p.calibratedClassProbabilities, 0, p.calibratedClassProbabilities.length);
       }
       return p;
  }

Eg: classProbabilities =[0.82333,0,276666] labelIndex = 1 label = true domainValues = [false,true]

what does this labelIndex signifies and does the class probabilities order is same as the domain value order ,If order is same then it means that here probability of false is 0.82333 and probability of true is 0.27666 but why is this labelIndex showing as 1 and label as true.

Please help me to figure out this issue.

  • Why do you think the answer is wrong? The threshold used for choosing the predicted class for binomial classification problems is max-F1. If you don't like that threshold, then you can do the thresholding yourself. – TomKraljevic Nov 03 '17 at 16:12

1 Answers1

0

Like Tom commented, the prediction is not "wrong". You can infer from this that the threshold H2O has chosen is less than 0.27666. You probably have imbalanced training data, otherwise H2O would have not picked a low threshold for classifying a predicted value of 0.27666 as a 1. Does your training set include fewer examples of the positive class than the negative class?

If you don't like that threshold for whatever reason, then you can manually create your own. Just make sure you know how to properly evaluate the effect of using different thresholds on the performance of your model, otherwise I'd recommend just using the default threshold.

The name, "classProbabilities" is a misnomer. These are not actual probabilities, they are predicted values, though people often use the terms interchangeably. Binary classification algorithms produce "predicted values" that look like probabilities when they're between 0 and 1, but unless a calibration process is performed, they are not going to represent the probabilities. Calibration is not necessarily a straight-forward process and there are many techniques. Here's some more info about calibration methods for imbalanced data. In H2O, you can perform calibration using Platt scaling using the calibrate_model option. But this is probably not really necessary to what you're trying to do.

The proper way to use the raw output from a binary classification model is to only look at the predicted value for the positive class (you can simply ignore the predicted value for the negative class). Then you choose a threshold which suits your needs, or you can use the default threshold in H2O, which is chosen to maximize the F1 score. Some other software will use a hardcoded threshold of 0.5, but that will be a terrible choice if you don't have an even number of positive and negative examples in your training data. If you have only a few positive examples in your training data, then the best threshold will be something much lower than 0.5.

Erin LeDell
  • 8,704
  • 1
  • 19
  • 35
  • Thanks for answering. – RAHUL TARWAY Nov 06 '17 at 11:29
  • Thanks a lot. My training data is skewed and has more negative examples. Actually I am new to H2O. I have some further questions. 1). Is there any documentation in h2o regarding selecting threshold and prediction and use calibration? 2). How can I use the threshold for automl build? – RAHUL TARWAY Nov 06 '17 at 11:42