Can I use Spark without Hadoop for development environment?

Question

I'm very new to the concepts of Big Data and related areas, sorry if I've made some mistake or typo.

I would like to understand Apache Spark and use it only in my computer, in a development / test environment. As Hadoop include HDFS (Hadoop Distributed File System) and other softwares that only matters to distributed systems, can I discard that? If so, where can I download a version of Spark that doesn't need Hadoop? Here I can find only Hadoop dependent versions.

What do I need:

Run all features from Spark without problems, but in a single computer (my home computer).
Everything that I made in my computer with Spark should run in a future cluster without problems.

There's reason to use Hadoop or any other distributed file system for Spark if I will run it on my computer for testing purposes?

Note that "Can apache spark run without hadoop?" is a different question from mine, because I do want run Spark in a development environment.

Spark works with the native file system using Hadoop utilities, so you can just grab it and use it. Did you give it a try and it didn't work? — Justin Pihony, Sep 12 '15 at 00:16
Can you send me the link of this Spark version? Also, I have made some mistakes when I read the Spark documentation, will edit the question now. — Paladini, Sep 12 '15 at 00:26
Just go to the main site and download it with the Hadoop distro. — Justin Pihony, Sep 12 '15 at 00:28
@JustinPihony I can't use Hadoop right now, my Spark with Hadoop isn't compiling. There's no version without Hadoop? — Paladini, Sep 12 '15 at 00:33
That sounds like a different problem though, why isn't it compiling? — Justin Pihony, Sep 12 '15 at 00:36
I'll try do everything from zero again, if I got the problem I comment here with the error message. Anyway, thanks for the help / advice. — Paladini, Sep 12 '15 at 00:38
@FernandoPaladini Spark(without Hadoop) is available on Spark download page. I have added URL in my answer below — Pradeep Bhadani, Sep 25 '15 at 10:46

score 14 · Accepted Answer · edited Jun 29 '21 at 14:24

14

Yes you can install Spark without Hadoop. Go through Spark official documentation :http://spark.apache.org/docs/latest/spark-standalone.html

Rough steps :

Download precomplied spark or download spark source and build locally
extract TAR
Set required environment variable
Run start script .

Spark(without Hadoop) - Available on Spark Download page URL : https://www.apache.org/dyn/closer.lua/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz

If this url do not work then try to get it from Spark download page

edited Jun 29 '21 at 14:24

thebluephantom

16,458
8
40
83

answered Sep 14 '15 at 09:05

Pradeep Bhadani

4,435
6
29
48

2

Can you please be more specific about the "required environment variable"? I assume it's HADOOP_HOME_DIR, and I would like to know how to set it. I have successfully developed on Windows by downloading HadoopUtils and having HADOOP_HOME_DIR point there, but how should I set it on Linux? I am working on one Linux server where Hadoop is not installed. There is a Hadoop installation on another server. How should I set HADOOP_HOME_DIR? – radumanolescu May 30 '19 at 13:55
1

But it is a contradiction: "spark-2.2.0-bin-hadoop2.7.tgz" is **bin-hadoop2** and there are **bin-without-hadoop.tgz** option, so, something is wrong here. – Peter Krauss Sep 10 '19 at 19:08

ruseel · Answer 2 · 2021-01-18T08:31:32.570

0

This is not a proper answer to original question. Sorry, It is my fault.

If someone want to run spark without hadoop distribution tar.gz.

there should be environment variable to set. this spark-env.sh worked for me.

#!/bin/sh
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

edited Jan 18 '21 at 08:31

answered Jan 18 '21 at 04:49

ruseel

1,578
2
21
41

So in other words spark actually requires hadoop to run, hadoop can be installed either separately or downloaded bundled with spark, right? – Yar Dec 29 '22 at 14:12
Yes. Spark actually requires hadoop library to run. Spark has dependency for hadoop library. Yes. Hadoop library can be installed separately. Yes. Hadoop library is bundled in "spark with hadoop" version. Yes. Spark can run without hadoop cluster. – ruseel Mar 28 '23 at 05:38

Can I use Spark without Hadoop for development environment?

What do I need:

2 Answers2

Linked