3

I want to do cross join between two H2OFrames. Looking for work around Strictly in H2OFrame

col1.1 <- c('A', 'B', 'E', 'C', 'F', 'D')
dummy <- rep(1,6)

d1.hex <- as.h2o( cbind( col1.1, dummy ) )

col2.1 <- c('xx', 'yy', 'zz', 'ww')

dummy <- rep(1,4)

d2.hex <- as.h2o( cbind( col2.1, dummy ) )

If I use all =TRUE it throws Error : unimplemented

h2o.merge(d1.hex, d2.hex, all = TRUE)

If I use default, joining result is not cross join

h2o.merge(d1.hex, d2.hex )

dummy col1.1 col2.1

1 A xx

1 B xx

1 E xx

1 C xx

1 F xx

1 D xx

I have tried changing data types of joining column to categorical or numeric but no success. Looking for your help in resolving the issue.

Thank you

Vikash Kumar
  • 348
  • 1
  • 9
  • so I'm not too familiar with h2o, but that is where you're running into the issue. The `merge(df1, df2, all = TRUE)` is the correct way to cross join in R it seems. – Matt W. Nov 27 '17 at 16:40
  • Thank you @MattW. But I am looking for solution in H2OFrame. From reading file to making prediction, I am trying to use H2O and my data set is huge. – Vikash Kumar Nov 27 '17 at 18:13

1 Answers1

0

The frustrating answer is that you cannot, and there are already two bug reports for it:

https://0xdata.atlassian.net/browse/PUBDEV-4516

https://0xdata.atlassian.net/browse/PUBDEV-3699

The simplest workaround is to download all your data, and do it in the R client. But with big data that may not be possible. If you must do it in the H2O cluster you will need a loop:

  1. Copy rows with first unique value in d1.hex into tmp
  2. tmp2 = h2o.merge(tmp, d2.hex, all.y = TRUE)

Repeat for each unique value in d1.hex. Then, at the end, do a h2o.rbind() on all your tmp2 tables.

Or, the classic open source solution: implement the unimplemented code yourself (or beg/pay h2o.ai to implement it).

Darren Cook
  • 27,837
  • 13
  • 117
  • 217