2

I have been intermittently getting distribution error when running a sample IRIS model in sparkling water.

Sparkling water: 2.1 Spark streaming kafka - 0.10.0.0 Running locally using spark submit - Only master

DistributedException from xxx:54321, caused by java.lang.NullPointerException
            at water.MRTask.getResult(MRTask.java:478)
            at water.MRTask.getResult(MRTask.java:486)
            at water.MRTask.doAll(MRTask.java:390)
            at water.MRTask.doAll(MRTask.java:396)
            at hex.Model.predictScoreImpl(Model.java:1103)
            at hex.Model.score(Model.java:964)
            at hex.Model.score(Model.java:932)
    ....
    Caused by: java.lang.NullPointerException
        at water.fvec.Vec.chunkForChunkIdx(Vec.java:1014)
        at water.fvec.CategoricalWrappedVec.chunkForChunkIdx(CategoricalWrappedVec.java:49)
        at water.MRTask.compute2(MRTask.java:618)
        at water.MRTask.compute2(MRTask.java:591)
        at water.MRTask.compute2(MRTask.java:591)
        at water.H2O$H2OCountedCompleter.compute1(H2O.java:1223)
        at hex.Model$BigScore$Icer.compute1(Model$BigScore$Icer.java)
        at water.H2O$H2OCountedCompleter.compute(H2O.java:1219)
        at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
        at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
        at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
        at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
        at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Lalit Agarwal
  • 2,354
  • 1
  • 14
  • 18

1 Answers1

0

So the problem is that H2O model is not seeing the data and causing NPE. The main reasons could be that h2o dataframe is deleted either at the time of prediction or just before prediction call.

We are interested to know how you do process mini batch data i.e. how mini batch is transformed into h2o data frame.

It will also help if you explain "how h2o model is being called to make prediction".

AvkashChauhan
  • 20,495
  • 3
  • 34
  • 65
  • I am not sure what exactly was wrong with my code but when I executed the same piece on cluster and not local, it worked fine. I think the issue was mostly with my local network settings. Sorry for the late response. – Lalit Agarwal Apr 04 '17 at 18:41