I have not faced this problem with any of other software on mysystem. Able to install and run everything in window terminal/command prompt and Git-Bash
Recently, I started learning Spark. Installed Spark setting everything JAVA_HOME, SCALA_HOME, hadoop winutils file. Spark-shell and pyspark-shell both are running perfect in command prompt/window terminal and in Jupyter through pyspark lib.
spark-3.0.1-bin-hadoop2.7
python 3.8.3
Windows 10
git version 2.29.2.windows.2
But I am not able to figure out it for Git Bash(tried with admin permissions). I am getting this error when I try to run spark-shell or pySpark:
Error: Could not find or load main class org.apache.spark.launcher.Main
/c/Spark/spark-3.0.1-bin-hadoop2.7/bin/spark-class: line 96: CMD: bad array subscript
I searched for solutions and found setting up environment variables in .bashrc or spark-env-sh. Set up following for pySpark shell:
export JAVA_HOME='/c/Program Files/Java/jdk1.8.0_111'
export SPARK_HOME='/c/Spark/spark-3.0.1-bin-hadoop2.7'
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PYTHONPATH
export PYSPARK_PYTHON='C:/Users/raman/anaconda3/python'
export PYSPARK_DRIVER_PYTHON='C:/Users/raman/anaconda3/python'
It didn't work out either. If I trace back error in spark-class file. It is as such: In line 96
My question,
- what is the reason for this error?How I can resolve it ?
- Are there any well-defined steps to set up spark-shell in Git Bash for Windows(not able to find anything solid on net)?
thanks.