1

I'm currently working with h2o.xgboost in h2o version 3.26.0.2 and i get the java.lang.NullPointerException (full errror below).

Dataset is 25 GB in csv with 6.000.000 Rows (trian + test) and the cluster info that I use is:

R is connected to the H2O cluster: 
    H2O cluster uptime:         30 minutes 18 seconds 
    H2O cluster timezone:       Europe/Madrid 
    H2O data parsing timezone:  UTC 
    H2O cluster version:        3.26.0.2 
    H2O cluster version age:    7 months and 12 days !!! 
    H2O cluster name:           H2O_started_from_R_xxx
    H2O cluster total nodes:    1 
    H2O cluster total memory:   343.27 GB 
    H2O cluster total cores:    1 
    H2O cluster allowed cores:  48 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        60576 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    H2O API Extensions:         Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4 
    R Version:                  R version 3.5.1 (2018-07-02) 

Data h2o Types are:

 types   N
1:   int 319
2:  real 316

And My code:

start <- proc.time()
model_trans.h2o <- h2o::h2o.xgboost( 
  distribution              = "gaussian"
  ,model_id                 = "xgb_test"
  ,training_frame           = data_train.h2o
  ,validation_frame         = data_test.h2o
  ,x                        = vars
  ,y                        = target_str
  ,seed                     = 1234
  ,ntrees                   = 1500
  ,learn_rate               = 0.08
  ,col_sample_rate_per_tree = 0.8
  ,max_dept                 = 5
  ,verbose                  = F
)
end <- proc.time(); end - start;

The error is given usually after 2 mins and I have made the following tests:

  1. Use less columns. With 500 columns is working fine (I did not use a random sample just 1:500 from colnames).
  2. Use less data. If I reduce the data, let's say 2.000.000 Rows it's working fine also.

After this tests I guessed that what is happening it's that somehow h2o.xgboost is not prepared to handle this much data and and when tries to expand it's "matrix" is crushing.

Could it be an error of the h2o version?

Can it be solved without updating the version? Since h2o models are not compatible cross versions this can be a problem for me.

Thanks in advance!

Note 1:

I've done my previous research on the subject and only found:

What is a NullPointerException, and how do I fix it? SO discussion abount null pointer exeption

https://0xdata.atlassian.net/browse/PUBDEV-6921 Some h2o forum without clear answer (this second one is pretty much useless)

Note 2: The full error is:

java.lang.NullPointerException

java.lang.NullPointerException
    at hex.tree.xgboost.matrix.SparseMatrixFactory$NestedArrayPointer.set(SparseMatrixFactory.java:87)
    at hex.tree.xgboost.matrix.SparseMatrixFactory$InitializeCSRMatrixFromChunkIdsMrFun.map(SparseMatrixFactory.java:178)
    at water.LocalMR.compute2(LocalMR.java:84)
    at water.LocalMR.compute2(LocalMR.java:76)
    at water.LocalMR.compute2(LocalMR.java:76)
    at water.LocalMR.compute2(LocalMR.java:76)
    at water.LocalMR.compute2(LocalMR.java:76)
    at water.LocalMR.compute2(LocalMR.java:76)
    at water.LocalMR.compute2(LocalMR.java:76)
    at water.LocalMR.compute2(LocalMR.java:76)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1417)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

0 Answers0