4

After spark installation 2.3 and setting the following env variables in .bashrc (using gitbash)

  1. HADOOP_HOME

  2. SPARK_HOME

  3. PYSPARK_PYTHON

  4. JDK_HOME

executing $SPARK_HOME/bin/spark-submit is displaying the following error.

Error: Could not find or load main class org.apache.spark.launcher.Main

I did some research checking in stackoverflow and other sites, but could not figure out the problem.

Execution environment

  1. Windows 10 Enterprise
  2. Spark version - 2.3
  3. Python version - 3.6.4

Can you please provide some pointers?

James Z
  • 12,209
  • 10
  • 24
  • 44
madmatrix
  • 205
  • 1
  • 4
  • 12

3 Answers3

5

I had that error message. It probably may have several root causes but this how I investigated and solved the problem (on linux):

  • instead of launching spark-submit, try using bash -x spark-submit to see which line fails.
  • do that process several times ( since spark-submit calls nested scripts ) until you find the underlying process called : in my case something like :

/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp '/opt/spark-2.2.0-bin-hadoop2.7/conf/:/opt/spark-2.2.0-bin-hadoop2.7/jars/*' -Xmx1g org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name 'Spark shell' spark-shell

So, spark-submit launches a java process and can't find the org.apache.spark.launcher.Main class using the files in /opt/spark-2.2.0-bin-hadoop2.7/jars/* (see the -cp option above). I did an ls in this jars folder and counted 4 files instead of the whole spark distrib (~200 files). It was probably a problem during the installation process. So I reinstalled spark, checked the jar folder and it worked like a charm.

So, you should:

  • check the java command (cp option)
  • check your jars folder ( does it contain ths at least all the spark-*.jar ?)

Hope it helps.

poloC
  • 537
  • 2
  • 6
  • 16
1

Verify below steps :

  1. spark-launcher_*.jar is present at $SPARK_HOME/jars folder?
  2. explode spark-launcher_*.jar to verify if you have Main.class or not.

If above is true then you may be running spark-submit on windows OS using cygwin terminal.

Try using spark-submit.cmd instead also cygwin parses the drives like /c/ and this will not work in windows so its important to provide the absolute path for the env variables by qualifying it with 'C:/' and not '/c/'.

Dharman
  • 30,962
  • 25
  • 85
  • 135
0
  1. Check Spark home directory contained all folder and files(xml, jars etc.) otherwise install Spark.
  2. Check your JAVA_HOME and SPARK_HOME environment variable are set in your .bashrc file, try setting the below:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/

export SPARK_HOME=/home/ubuntu-username/spark-2.4.8-bin-hadoop2.6/

Or wherever your spark is downloaded to

export SPARK_HOME=/home/Downloads/spark-2.4.8-bin-hadoop2.6/

once done, save your .bash and run bash command on terminal or restart the shell and try spark-shell