4

I previously had PySpark installed as a Python package I installed through pip, I uninstalled it recently with a clean version of Python and downloaded the standalone version.

In my User variables I made a path with name: SPARK_HOME

with a value of: C:\spark-2.3.2-bin-hadoop2.7\bin

In System variables under Path I made an entry: C:\spark-2.3.2-bin-hadoop2.7\bin

When I run pyspark

I can not run spark-shell either. Any ideas?

  • 1
    Firstly, SPARK_HOME should be without bin C:\spark-2.3.2-bin-hadoop2.7\ You add \bin in system variables. Did you add JDK as JAVA_HOME? If yes, did you set JAVA_HOME in hadoop_env.cmd? – pvy4917 Oct 09 '18 at 17:12
  • huh, removing bin from SPARK_HOME and JAVA_HOME fixed it. Thank you! – Michael Naples Oct 09 '18 at 17:22

2 Answers2

3

SPARK_HOME should be without bin folder. Hence,

Set SPARK_HOME to C:\spark-2.3.2-bin-hadoop2.7\

pvy4917
  • 1,768
  • 17
  • 23
  • On my word, that took a day and an evening to figure out. Judging by the number of people having issues, Spark and other library creators really ought to provide *clear* installation instructions. – Denis G. Labrecque Apr 09 '21 at 15:45
2

Window users have to download a compatible winutils exe version and save it in your Spark's bin folder.

Find the compatible Hadoop distribution, download and save it in your Spark folder.

e.g. Download "https://github.com/steveloughran/winutils/blob/master/hadoop-2.7.1/bin/winutils.exe" and save it in your "C:\spark-2.3.2-bin-hadoop2.7\bin"

Different winutils version could be found in this link. https://github.com/steveloughran/winutils

PPK
  • 307
  • 2
  • 8