Questions tagged [sparkling-water]

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark.

From Sparkling-water Github:

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark. It provides:

Utilities to publish Spark data structures (RDDs, DataFrames) as H2O's frames and vice versa. DSL to use Spark data structures as input for H2O's algorithms Basic building blocks to create ML applications utilizing Spark and H2O APIs Python interface enabling use of Sparkling Water directly from pySpark

Getting Started

  • Select right version

The Sparkling Water is developed in multiple parallel branches. Each branch corresponds to a Spark major release ie for Spark 1.6 use branch sparkling version 1.6

Recommended reference sources:

Sparkling-water installation guide
Sparkling water documentation
Sparkling-water GitHub Documentation

129 questions
16
votes
3 answers

How to Setup SPARK_HOME variable?

Following the steps of Sparkling Water from the link http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.2/0/index.html. Running in terminal : ~/InstallFile/SparklingWater/sparkling-water-2.2.0$ bin/sparkling-shell --conf…
roshan_ray
  • 197
  • 1
  • 1
  • 9
8
votes
1 answer

Difference between spark with h2o and sparkling water

I have a few questions or doubts on sparkling water and why is it needed. Lets assume that I have a generated h2o model with both binary and pojo. Now I want to deploy the model into production and have an option for using pojo and binary…
Lalit Agarwal
  • 2,354
  • 1
  • 14
  • 18
3
votes
0 answers

H2O Sparkling Water AutoML not working properly in Spark Scala. Exception: Ease restrictions on setMaxModels or setMaxRuntimeSecs

I'm looking into H2O sparkling water AutoML using scala. I'm running it on my laptop on localhost. Even though I'm not adding any restrictions on H2OAutoML() class using setMaxModels or setMaxRuntimeSecs method. The model.fit method fails with an…
3
votes
1 answer

H2O sparkling water error from large Spark Dataframe to H2O Dataframe

When I try to convert from spark dataframe to H2O data frame I get the error below. This seems to have to do with the size of the dataframe because when I make it smaller the converter between spark and H2O works well. Are there any configurations…
Levi Brackman
  • 325
  • 2
  • 17
2
votes
0 answers

How to shade packages inside a fat jar depdency

I've an SBT project that depends on "com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop3-2.2.2" which is bringing a recent version of google-api-services-storage. I've also another dependency to Sparkling Water which is a fat jar that seems…
bachr
  • 5,780
  • 12
  • 57
  • 92
2
votes
1 answer

Error H2O cluster should be of size 3 but is 2

I'm trying to run H2O SW on Kubernetes using the steps in the documentation. I launch a test SW app $ bin/spark-submit \ --master k8s://$KUBERNETES_ENDPOINT \ --deploy-mode cluster \ --class ai.h2o.sparkling.InitTest \ --conf…
bachr
  • 5,780
  • 12
  • 57
  • 92
2
votes
2 answers

H2OGridSearch H2OGBM pyspark: NullPointerException in extractH2OParameters

I'm trying to run a grid search for Gradient Boosting Machine in pyspark with H2O Sparkling Water. Produced a reproducible example with the famous iris dataset. from pysparkling import H2OContext, H2OConf import pyspark from pyspark.sql.types import…
lrnzcig
  • 3,868
  • 4
  • 36
  • 50
2
votes
1 answer

Excluding interecept in H2O (python and R) produces non-zero coefficient for intercept anyway

I am trying to use an H2O library in both Python and R to produce a GLM without an intercept included. Unfortunately, it does not appear to be working. The results are completely off, the intercept coefficient is non-zero (only standardized…
Alex
  • 43
  • 1
  • 6
2
votes
2 answers

How to combine prediction with test frame

The task to merge prediction frame to h2oframe containing features is not being done by merge method of water.rapids.Merge. How to use merge method to merge prediction's frame to features's frame and let me know the parameters description of this…
poojanavin
  • 31
  • 4
2
votes
2 answers

How to map over DataFrame in spark to extract RowData and make predictions using h2o mojo model

I have a saved h2o model in mojo format, and now I am trying to load it and use it to make predictions on a new dataset (df) as part of a spark app written in scala. Ideally, I wish to append a new row to the existing DataFrame containing the class…
renegademonkey
  • 457
  • 1
  • 7
  • 18
2
votes
2 answers

Adding additional data to each row in an H2OFrame

I am working with a huge H2OFrame (~150gb, ~200 million rows), which I need to manipulate a little. To be more specific: I have to use the frame's ip column, to find the location/city names for each IP and add this information to each of the frame's…
ksbg
  • 3,214
  • 1
  • 22
  • 35
2
votes
1 answer

Sparkling water often throws java.lang.ArrayIndexOutOfBoundsException: 65535

H2O Sparkling water often throws below exception, we are rerunning it manually whenever this happens. The Issue is the spark job doesn't exit when this exception occurs, they don't return exit status and we are not able to automate this process.…
2
votes
1 answer

H2O Spark streaming 2.1 distribution

I have been intermittently getting distribution error when running a sample IRIS model in sparkling water. Sparkling water: 2.1 Spark streaming kafka - 0.10.0.0 Running locally using spark submit - Only master DistributedException from xxx:54321,…
Lalit Agarwal
  • 2,354
  • 1
  • 14
  • 18
2
votes
1 answer

Input line is too long - Spark

I am getting following error while executing sparkling-shell2.cmd bat file. I walked through and I am getting this error while executing spark-shell.cmd with following paramters cd %TOPDIR% %SPARK_HOME%/bin/spark-shell.cmd --jars…
Mansoor
  • 1,157
  • 10
  • 29
2
votes
3 answers

use sparkling-water via spark packages: com.google.guava... not found

I'm trying to use H2O.ai's sparkling-water via spark packages. I'm following their guide: https://github.com/h2oai/sparkling-water#use-sparkling-water-via-spark-packages I'm on Hortonworks HDP 2.4 with Scala 2.10 and Spark 1.6.1. I put the following…
BlueFeet
  • 2,407
  • 4
  • 21
  • 24
1
2 3
8 9