I'm on spark-1.4.1. How can I set the system environment variables for Python?
For instance, in R,
Sys.setenv(SPARK_HOME = "C:/Apache/spark-1.4.1")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
What about in Python?
import os
import sys
from pyspark.sql import SQLContext
sc = SparkContext(appName="PythonSQL")
sqlContext = SQLContext(sc)
# Set the system environment variables.
# ref: https://github.com/apache/spark/blob/master/examples/src/main/python/sql.py
if len(sys.argv) < 2:
path = "file://" + \
os.path.join(os.environ['SPARK_HOME'], "examples/src/main/resources/people.json")
else:
path = sys.argv[1]
# Create the DataFrame
df = sqlContext.jsonFile(path)
# Show the content of the DataFrame
df.show()
I get this error,
df is not defined.
Any ideas?