2

The task to merge prediction frame to h2oframe containing features is not being done by merge method of water.rapids.Merge.

How to use merge method to merge prediction's frame to features's frame and let me know the parameters description of this method, so method could be called properly?

merge(Frame leftFrame, Frame riteFrame, int[] leftCols, int[] riteCols, boolean allLeft, int[][] id_maps) 

merge(Frame leftFrame, Frame riteFrame, int[] leftCols, int[] riteCols, boolean allLeft, int[][] id_maps, int[] ascendingL, int[] ascendingR) 

what are int[][] id_maps, int[] leftCols, int[] riteCols parameters?

What is the right way to get merged frame of prediction's frame to features's frame?

James Z
  • 12,209
  • 10
  • 24
  • 44
poojanavin
  • 31
  • 4

2 Answers2

0

To answer your main question, use add():

val predFrame = gbmModel.predict(dataFrame)
dataAndPredFrame = dataFrame.add(predFrame)

(Shamelessly stolen from https://github.com/h2oai/sparkling-water/issues/194 )

merge() is like an SQL join, and is for when you have two data frames of different sizes; the arguments you are asking about are used to specify which columns in each of the two frames need to match for the join to happen.

I cannot seem to find any sparkling water documentation for it (please post in the comments if anyone knows where it is!), but you can get the idea from looking at the R or Python API docs: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-munging/merging-data.html

Darren Cook
  • 27,837
  • 13
  • 117
  • 217
  • It doesn't work for pysparkling 2.2: `AttributeError: 'H2OFrame' object has no attribute 'add'`. Your sugestion of h2o solution works only on data, which (completely) fits into RAM memory, so it is not the case of Spark + H2O. – wind May 16 '18 at 08:52
  • @wind Thanks. I think the OP wanted a Scala solution, but your info will be useful for people using pysparkling. (Actually that is just like the normal H2O python API, isn't it.) – Darren Cook May 16 '18 at 21:45
0

I don't believe that h2o maintains the original row order as state above. I've used h2o.cbind to merge the original data set with the predictions. Then, using the actual response values against the predicted values, I reconstructed the confusion matrix. Unfortunately, it had very different counts from the confusion matrix produced by the model. If the rows in the original data set had the same order, the confusion matrix counts should be the same both inside the R script as well outside.