Questions tagged [sparkling-water]

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark.

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark. It provides:

Utilities to publish Spark data structures (RDDs, DataFrames) as H2O's frames and vice versa. DSL to use Spark data structures as input for H2O's algorithms Basic building blocks to create ML applications utilizing Spark and H2O APIs Python interface enabling use of Sparkling Water directly from pySpark

Getting Started

Select right version

The Sparkling Water is developed in multiple parallel branches. Each branch corresponds to a Spark major release ie for Spark 1.6 use branch sparkling version 1.6

Recommended reference sources:

Sparkling-water installation guide
Sparkling water documentation
Sparkling-water GitHub Documentation

129 questions

votes

3 answers

How to Setup SPARK_HOME variable?

Following the steps of Sparkling Water from the link http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.2/0/index.html. Running in terminal : ~/InstallFile/SparklingWater/sparkling-water-2.2.0$ bin/sparkling-shell --conf…

apache-spark h2o sparkling-water

asked Oct 06 '17 at 20:42

roshan_ray

votes

1 answer

Difference between spark with h2o and sparkling water

I have a few questions or doubts on sparkling water and why is it needed. Lets assume that I have a generated h2o model with both binary and pojo. Now I want to deploy the model into production and have an option for using pojo and binary…

h2o sparkling-water

asked Apr 05 '17 at 16:08

Lalit Agarwal

2,354
1
14
18

votes

0 answers

H2O Sparkling Water AutoML not working properly in Spark Scala. Exception: Ease restrictions on setMaxModels or setMaxRuntimeSecs

I'm looking into H2O sparkling water AutoML using scala. I'm running it on my laptop on localhost. Even though I'm not adding any restrictions on H2OAutoML() class using setMaxModels or setMaxRuntimeSecs method. The model.fit method fails with an…

scala apache-spark h2o automl sparkling-water

asked Dec 18 '19 at 19:25

Bhushan Gosavi

votes

1 answer

H2O sparkling water error from large Spark Dataframe to H2O Dataframe

When I try to convert from spark dataframe to H2O data frame I get the error below. This seems to have to do with the size of the dataframe because when I make it smaller the converter between spark and H2O works well. Are there any configurations…

apache-spark h2o sparklyr sparkling-water

asked Jun 13 '17 at 17:24

Levi Brackman

votes

0 answers

How to shade packages inside a fat jar depdency

I've an SBT project that depends on "com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop3-2.2.2" which is bringing a recent version of google-api-services-storage. I've also another dependency to Sparkling Water which is a fat jar that seems…

scala sbt google-cloud-dataproc sbt-assembly sparkling-water

asked Sep 17 '21 at 16:33

bachr

5,780
12
57
92

votes

1 answer

Error H2O cluster should be of size 3 but is 2

I'm trying to run H2O SW on Kubernetes using the steps in the documentation. I launch a test SW app $ bin/spark-submit \ --master k8s://$KUBERNETES_ENDPOINT \ --deploy-mode cluster \ --class ai.h2o.sparkling.InitTest \ --conf…

apache-spark kubernetes sparkling-water

asked Nov 25 '20 at 19:28

bachr

5,780
12
57
92

votes

2 answers

H2OGridSearch H2OGBM pyspark: NullPointerException in extractH2OParameters

I'm trying to run a grid search for Gradient Boosting Machine in pyspark with H2O Sparkling Water. Produced a reproducible example with the famous iris dataset. from pysparkling import H2OContext, H2OConf import pyspark from pyspark.sql.types import…

machine-learning pyspark h2o sparkling-water

asked Feb 06 '20 at 12:10

lrnzcig

3,868
4
36
50

votes

1 answer

Excluding interecept in H2O (python and R) produces non-zero coefficient for intercept anyway

I am trying to use an H2O library in both Python and R to produce a GLM without an intercept included. Unfortunately, it does not appear to be working. The results are completely off, the intercept coefficient is non-zero (only standardized…

python r h2o sparkling-water

asked Oct 26 '18 at 11:23

Alex

votes

2 answers

How to combine prediction with test frame

The task to merge prediction frame to h2oframe containing features is not being done by merge method of water.rapids.Merge. How to use merge method to merge prediction's frame to features's frame and let me know the parameters description of this…

h2o sparkling-water

asked Mar 23 '18 at 11:36

poojanavin

votes

2 answers

How to map over DataFrame in spark to extract RowData and make predictions using h2o mojo model

I have a saved h2o model in mojo format, and now I am trying to load it and use it to make predictions on a new dataset (df) as part of a spark app written in scala. Ideally, I wish to append a new row to the existing DataFrame containing the class…

scala apache-spark h2o sparkling-water

asked Dec 15 '17 at 08:34

renegademonkey

votes

2 answers

Adding additional data to each row in an H2OFrame

I am working with a huge H2OFrame (~150gb, ~200 million rows), which I need to manipulate a little. To be more specific: I have to use the frame's ip column, to find the location/city names for each IP and add this information to each of the frame's…

python h2o sparkling-water

asked Sep 20 '17 at 08:30

ksbg

3,214
1
22
35

votes

1 answer

Sparkling water often throws java.lang.ArrayIndexOutOfBoundsException: 65535

H2O Sparkling water often throws below exception, we are rerunning it manually whenever this happens. The Issue is the spark job doesn't exit when this exception occurs, they don't return exit status and we are not able to automate this process.…

apache-spark apache-spark-mllib h2o apache-spark-ml sparkling-water

asked Apr 19 '17 at 17:46

DINESHKUMAR MURUGAN

votes

1 answer

H2O Spark streaming 2.1 distribution

I have been intermittently getting distribution error when running a sample IRIS model in sparkling water. Sparkling water: 2.1 Spark streaming kafka - 0.10.0.0 Running locally using spark submit - Only master DistributedException from xxx:54321,…

spark-streaming h2o sparkling-water

asked Mar 30 '17 at 14:00

Lalit Agarwal

2,354
1
14
18

votes

1 answer

Input line is too long - Spark

I am getting following error while executing sparkling-shell2.cmd bat file. I walked through and I am getting this error while executing spark-shell.cmd with following paramters cd %TOPDIR% %SPARK_HOME%/bin/spark-shell.cmd --jars…

apache-spark h2o sparkling-water

asked Feb 22 '17 at 07:19

Mansoor

1,157
10
29

votes

3 answers

use sparkling-water via spark packages: com.google.guava... not found

I'm trying to use H2O.ai's sparkling-water via spark packages. I'm following their guide: https://github.com/h2oai/sparkling-water#use-sparkling-water-via-spark-packages I'm on Hortonworks HDP 2.4 with Scala 2.10 and Spark 1.6.1. I put the following…

apache-spark sparkling-water

asked Feb 17 '17 at 02:50

BlueFeet

2,407
4
21
24

2 3

…

8 9 Next