Sparkling Water integrates H2O's fast scalable machine learning engine with Spark.
From Sparkling-water Github:
Sparkling Water integrates H2O's fast scalable machine learning engine with Spark. It provides:
Utilities to publish Spark data structures (RDDs, DataFrames) as H2O's frames and vice versa. DSL to use Spark data structures as input for H2O's algorithms Basic building blocks to create ML applications utilizing Spark and H2O APIs Python interface enabling use of Sparkling Water directly from pySpark
Getting Started
- Select right version
The Sparkling Water is developed in multiple parallel branches. Each branch corresponds to a Spark major release ie for Spark 1.6 use branch sparkling version 1.6
Recommended reference sources:
Sparkling-water installation guide
Sparkling water documentation
Sparkling-water GitHub Documentation