10

I am trying to submit spark-submit but its failing with as weird message.

 Error: Could not find or load main class org.apache.spark.launcher.Main
 /opt/spark/bin/spark-class: line 96: CMD: bad array subscript

this is the first time I am seeing this kind of error. I tried to check the code for the spark-class file but unable to decipher what is causing the issue.

# Turn off posix mode since it does not allow process substitution
set +o posix
CMD=()
DELIM=$'\n'
CMD_START_FLAG="false"
while IFS= read -d "$DELIM" -r ARG; do
  if [ "$CMD_START_FLAG" == "true" ]; then
    CMD+=("$ARG")
  else
    if [ "$ARG" == $'\0' ]; then
      # After NULL character is consumed, change the delimiter and consume command string.
      DELIM=''
      CMD_START_FLAG="true"
    elif [ "$ARG" != "" ]; then
      echo "$ARG"
    fi
  fi
done < <(build_command "$@")

COUNT=${#CMD[@]}
LAST=$((COUNT - 1))
LAUNCHER_EXIT_CODE=${CMD[$LAST]}

the line which is mentioned in the error message is

LAUNCHER_EXIT_CODE=${CMD[$LAST]}

Any pointer or any idea why the issue will help me a lot.

Thanks

Jonas V
  • 668
  • 2
  • 5
  • 20
Ashit_Kumar
  • 601
  • 2
  • 10
  • 28
  • 1
    @hatefAlipoor yeah I was able to solve the issue by providing an entry point for the code to start – Ashit_Kumar Sep 10 '20 at 15:37
  • Would you mind clarifying what you mean by "providing an entry point for the code to start"? I seeing the same problem. This is what I'm seeing in my terminal: "$ pyspark set PYSPARK_SUBMIT_ARGS="--name" "PySparkShell" "pyspark-shell" && python C:\Users\oefel\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pyspark/bin/spark-class: line 96: CMD: bad array subscript" Is this what you were seeing? – Erin Sep 14 '20 at 21:53
  • @Erin can you provide a brief description what you are trying to do ? Looks like you are simply submitting a spark-submit job in a windows machine. Can you ensure the spark required paths are set. Mine error was in different context it was from a pod in which I was trying to execute my job. – Ashit_Kumar Sep 15 '20 at 08:28
  • I'm looking to correctly set up the SPARK_HOME env variable so as to use pyspark within a Jupyter Notebook. I've set env variables by Max' comment in this post: https://stackoverflow.com/questions/38798816/pyspark-command-not-recognised . I'm now seeing: "/c/spark/spark-2.4.7-bin-hadoop2.7/spark-2.4.7-bin-hadoop2.7/bin/pyspark: line 24: C:spark\spark-2.4.7-bin-hadoop2.7/bin/load-spark-env.sh: No such file or directory /c/spark/spark-2.4.7-bin-hadoop2.7/spark-2.4.7-bin-hadoop2.7/bin/pyspark: line 77: C:spark\spark-2.4.7-bin-hadoop2.7/bin/spark-submit: No such file or directory" – Erin Sep 15 '20 at 13:46
  • I am getting this simply by trying to run the pyspark shell in the terminal. Any thoughts? It looks like I'm trying to do something similar to what Erin is doing. – jkix May 18 '21 at 17:55
  • I am encountering the same issue while trying to run pyspark in a cygwin enviroment. My environment variables are: export SPARK_PYTHON=python export PYSPARK_PYTHON=python export SPARK_HOME='c:/spark/spark-3.1.2' export PATH="$SPARK_HOME/bin:$PATH" export JAVA_HOME='c:/Program Files/Java/jre1.8.0_301' I get the following output: set PYSPARK_SUBMIT_ARGS="--name" "PySparkShell" "pyspark-shell" && python c:/spark/spark-3.1.2/bin/spark-class: line 98: CMD: bad array subscript – Amit Gupta Aug 26 '21 at 20:01

2 Answers2

0

When I faced exactly the same problem, I looked in a bin directory of my Spark setup which is in your Windows PATH variable and when you run spark-submit <arguments> CMD searches for this command in Spark's bin. This directory can be found by executing echo %SPARK_HOME%\bin in CMD. And I saw two copies for every executable bin directory screenshot. This makes sense because we need different executable scripts for Windows and Linux. So finally I just typed spark-submit.cmd instead of spark-submit and everything worked like expected.

LinFelix
  • 1,026
  • 1
  • 13
  • 23
tia
  • 13
  • 1
  • 4
0

TLDR: the bash scripts provided with spark aren't designed to work in cygwin or msys2 shell environments on a Windows system. You should use the comparable .cmd scripts.

The error message you cite happens because the build_command function (defined in script spark-class) doesn't behave as the spark-class script expects it to on a Windows system. The build_command function calls org.apache.spark.launcher.Main for the purpose of generating the appropriate launch command line, and the spark-class script while loop expects NULLs to appear between the command line entries. However, on a Windows system, the NULLs are missing.

According the comment on lines 44-50 in the file https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/Main.java:

   * This class works in tandem with the "bin/spark-class" script on Unix-like systems, and
   * "bin/spark-class2.cmd" batch script on Windows to execute the final command.
   * <p>
   * On Unix-like systems, the output is a list of command arguments, separated by the NULL
   * character. On Windows, the output is a command line suitable for direct execution from the
   * script.
   */

Because of this, the while loop misfires, resulting in the error message you saw.

This problem doesn't happen when you run the .cmd scripts designed for Windows, although to do so from a bash shell, you might need to make them executable, like so:

$ chmod +x $SPARK_HOME/bin/ -R

Then, assuming $SPARK_HOME/bin is in your PATH, you can (for example) start a REPL session like this:

$ spark-shell.cmd

And similarly for the other bash scripts, use the .cmd alternative.

$ spark-submit <args>

philwalk
  • 634
  • 1
  • 7
  • 15