16

Following the steps of Sparkling Water from the link http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.2/0/index.html.

Running in terminal :

~/InstallFile/SparklingWater/sparkling-water-2.2.0$ bin/sparkling-shell --conf "spark.executor.memory=1g"

Please setup SPARK_HOME variable to your Spark installation

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
roshan_ray
  • 197
  • 1
  • 1
  • 9

3 Answers3

13

You should install and set the SPARK_HOME variable, in unix terminal run the following code to set the variable:

export SPARK_HOME="/path/to/spark"

To maintain this config you should append this to the end of your .bashrc.

See this for installation https://www.tutorialspoint.com/apache_spark/apache_spark_installation.htm

Jader Martins
  • 759
  • 6
  • 26
11

While using Jupyter Notebook with Anaconda, the function called to do this findspark.py does the following:

def find():
    spark_home = os.environ.get('SPARK_HOME', None)

    if not spark_home:
        for path in [
            '/usr/local/opt/apache-spark/libexec', # OS X Homebrew
            '/usr/lib/spark/' # AWS Amazon EMR
            # Any other common places to look?
        ]:
            if os.path.exists(path):
                spark_home = path
                break

    if not spark_home:
        raise ValueError("Couldn't find Spark, make sure SPARK_HOME env is set"
                         " or Spark is in an expected location (e.g. from homebrew installation).")

    return spark_home

So we're going to follow the next procedure.

1. Specify SPARK_HOME and JAVA_HOME

As we have seen in the above function, for Windows we need to specifiy the locations. The next function is a slightly modified version from these answer. It is modified because it is also necessary to specify a JAVA_HOME, which is the directory where you have installed it. Also, I have created a spark directory where I moved the dowloaded version of Spark that I'm using, for this procedure you could check out these link.

import os 
import sys

def configure_spark(spark_home=None, pyspark_python=None):
    spark_home = spark_home or "/path/to/default/spark/home"
    os.environ['SPARK_HOME'] = spark_home
    os.environ['JAVA_HOME'] = 'C:\Program Files\Java\jre1.8.0_231'

    # Add the PySpark directories to the Python path:
    sys.path.insert(1, os.path.join(spark_home, 'python'))
    sys.path.insert(1, os.path.join(spark_home, 'python', 'pyspark'))
    sys.path.insert(1, os.path.join(spark_home, 'python', 'build'))

    # If PySpark isn't specified, use currently running Python binary:
    pyspark_python = pyspark_python or sys.executable
    os.environ['PYSPARK_PYTHON'] = pyspark_python

configure_spark('C:\spark\spark-2.4.4-bin-hadoop2.6')

2. Configure SparkContext

When working locally, you should configurate SparkContext in the next way: (these link was useful)

import findspark
from pyspark.conf import SparkConf
from pyspark.context import SparkContext

# Find Spark Locally
location = findspark.find()
findspark.init(location, edit_rc=True)

# Start a SparkContext 
configure = SparkConf().set('spark.driver.host','127.0.0.1')
sc = pyspark.SparkContext(master = 'local', appName='desiredName', conf=configure)

This procedure has worked out nice for me, Thanks!.

Miguel Trejo
  • 5,913
  • 5
  • 24
  • 49
3

You will have to download the spark runtime on the machine where you want to use Sparkling Water. It could be either a local download or a clustered spark i.e. on Hadoop.

The SPARK_HOME variable is the directory/folder where sparkling water will find the spark run time.

In the following setting SPARK_HOME, I have Spark 2.1 downloaded on local machine and the path set is the unzipped spark 2.1 as below:

SPARK_HOME=/Users/avkashchauhan/tools/spark-2.1.0-bin-hadoop2.6

$ pwd
 /Users/avkashchauhan/tools/sw2/sparkling-water-2.1.14

Now when I launch the sparkling-shell as below it works fine:

~/tools/sw2/sparkling-water-2.1.14 $ bin/sparkling-shell                                                                                                                                                                                        

-----
  Spark master (MASTER)     : local[*]
  Spark home   (SPARK_HOME) : /Users/avkashchauhan/tools/spark-2.1.0-bin-hadoop2.6
  H2O build version         : 3.14.0.2 (weierstrass)
  Spark build version       : 2.1.1
  Scala version             : 2.11
----
AvkashChauhan
  • 20,495
  • 3
  • 34
  • 65