Unable to run spark-shell command after new Spark setup with Hadoop on Ubuntu 20.04

Question

I am trying to run Spark after installing but the command "spark-shell" gives the error:

Could not find or load main class version.

I tried to fix this by setting my JAVA_HOME in various (perhaps contradictory) ways. I also set SCALA_HOME and edited spark-env.sh. What steps may I take to fix this?

Similar to:

This question 2 (I am using Ubuntu 20.04, the above question is for Windows and this question is about spark-submit not spark-shell command) and this question:(this error is different to mine but similar)

Version Information: I am working on Ubuntu 20.04 Hadoop version: 2.10.0 Spark version: spark-2.4.5-bin-without-hadoop-scala-2.12 Scala version: 2.11.12 (previously I tried scala 2.12 as I thought this compatible) Java version: openjdk version 1.8.0_252, runtime: build 1.8.0_252-8u252-b09-1ubuntu1-b09 openJDK 64-Bit Server VM (build 25.252-b09, mixed mode) javac 1.8.0_252

Details of steps I have taken:

I have installed Hadoop (extracted program files to usr/hadoop, configured namenode and datanode, set javapath), Java 1.8 and scala. Hadoop works fine. I can see namenode in my browser and hadoop jobs.

I have installed Spark (extracted program files to usr/Spark).

In spark-env.sh I have set:

export HADOOP_CONF_DIR=/home/sperling/hadoop/hadoop-2.10.0,
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre

In bashrc I have set:

export SCALA_HOME=/usr/share/scala export HADOOP_HOME=/home/sperling/hadoop/hadoop-2.10.0 export SPARK_HOME=/home/sperling/spark export PATH=$PATH:/home/sperling/spark/bin

In etc/environment I have set:

JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java

I do not know what to try next as it seems that Spark cant find either Java or Scala yet they show up when I type echo $JAVA_HOME and echo $SCALA_HOME in terminal.

score 0 · Accepted Answer · answered May 19 '20 at 09:57

0

Spark version: spark-2.4.5-bin-without-hadoop-scala-2.12

It means that Spark is pre-built with or expecting Scala 2.12 I don't think you'd be able to run it with Scala 2.11

From what I can see at Spark compatibility page, Spark 2.4.5 is provided pre-built with either Hadoop 2.7.x or with Hadoop 3.2.x

https://spark.apache.org/downloads.html

I would suggest to either try it with one of the versions of Hadoop they recommend or install Hadoop 3.2.x+

answered May 19 '20 at 09:57

ZakukaZ

528
6
11

Thank you.I tried with Scala 2.12.10 and same problem. I will install Hadoop 2.7 and Spark 2.4.5 for Hadoop 2.7 and report back. I will use Scala 2.11 with this and Java 8. Does this sound correct? – sperling May 20 '20 at 07:35
This should work, I've seen the same configuration up & running with both Scala 2.11 and Scala 2.12 – ZakukaZ May 20 '20 at 10:28
1

Final working configuration for me is: Scala 2.12.11, Hadoop 3.2.1 and Spark 3.0.0 preview 2. Instructions I used are here:[link] (https://linuxconfig.org/ubuntu-20-04-hadoop). Is this the correct place to write this? – sperling Jun 03 '20 at 13:43

Unable to run spark-shell command after new Spark setup with Hadoop on Ubuntu 20.04

1 Answers1