4

I already have Hadoop 3.0.0 installed. Should I now install the with-hadoop or without-hadoop version of Apache Spark from this page?

I am following this guide to get started with Apache Spark.
It says

Download the latest version of Apache Spark (Pre-built according to your Hadoop version) from this link:...

But I am confused. If I already have an instance of Hadoop running in my machine, and then I download, install and run Apache-Spark-WITH-Hadoop, won't it start another additional instance of Hadoop?

MozenRath
  • 9,652
  • 13
  • 61
  • 104
JBel
  • 329
  • 1
  • 5
  • 19

2 Answers2

5

First off, Spark does not yet support Hadoop 3, as far as I know. You'll notice this by no available option for "your Hadoop version" available for download.

You can try setting HADOOP_CONF_DIR and HADOOP_HOME in your spark-env.sh, though, regardless of which you download.

You should always download the version without Hadoop if you already have it.

won't it start another additional instance of Hadoop?

No. You still would need to explicitly configure and start that version of Hadoop.

That Spark option is already configured to use the included Hadoop, I believe

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
5

This is in addition to the answer by @cricket_007.

If you have Hadoop installed, do not download spark with Hadoop, however, as your Hadoop version is still unsupported by any version of spark, you will need to download the one with Hadoop. Although, you will need to configure the bundled Hadoop version on your machine for Spark to run on. This will mean that all your data on the Hadoop 3 will be LOST. So, If you need this data, please take a backup of the data before beginning your downgrade/re-configuration. I do not think you will be able to host 2 instances of Hadoop on the same system because of certain environment variables.

MozenRath
  • 9,652
  • 13
  • 61
  • 104