13

At first I have a VM to which I access via ubuntu, and this VM is also Ubuntu 14.04. I need to install Apache Spark as soon as possible, but I can not find anything which can help me or give me references where it's best explained. I tried once to install it on my local machine Ubuntu 14.04 but it failed , but the thing is that I don't want to install it on a cluster. Any help please???

JPerk
  • 325
  • 1
  • 2
  • 12
  • Easiest way is to download their pre-built versions, unzip it and you are set to go. – ayan guha May 27 '15 at 13:32
  • @ayanguha So that means that at first I will have to install Hadoop in order to proceed installing Spark for some pre-built Hadoop version?? – JPerk May 27 '15 at 13:35
  • @ayanguha And do you have any idea how I can install it? Because I am working in my Virtual Machine – JPerk May 27 '15 at 13:37
  • 1
    No, you do not need Hadoop. You just get their pre built version and follow the instruction. If you are using python, I can give you a step by step process, for learning. For prod deployment, you better follow deployment guidelines on Spark site. – ayan guha May 27 '15 at 14:15

5 Answers5

24

You can install and start using spark in three easy steps:

  • Download latest version of Spark from here.
  • Navigate to the downloaded folder from terminal and run the following command:

    tar -xvf spark-x.x.x.tgz        //replace x's with your version
    
  • Navigate to the extracted folder and run one of the following command:

    ./bin/spark-shell               // for interactive scala shell
    ./bin/pyspark                   // for interactive python shell
    

You are now ready to play with spark.

gsamaras
  • 71,951
  • 46
  • 188
  • 305
karthik manchala
  • 13,492
  • 1
  • 31
  • 55
  • And what is required to do in case of using java instead of scala and python?? – JPerk May 27 '15 at 21:58
  • @PetraRichmond check [this](https://spark.apache.org/docs/latest/programming-guide.html).. – karthik manchala May 28 '15 at 09:41
  • There is an error on ubuntu. THis may be because of ubuntu's interesting java environment. ubuntu@ip-172-31-60-32:~/Downloads/spark-1.4.1$ ./bin/pyspark JAVA_HOME is not set – Geoffrey Anderson Sep 04 '15 at 13:34
  • @GeoffreyAnderson you need to set your JAVA_HOME for this.. you can follow [this link](http://stackoverflow.com/questions/17287542/setting-java-home-path-on-ubuntu) – karthik manchala Sep 04 '15 at 14:19
  • I have figured it out by including dependency of spark into my Maven project, that was the easiest way. So that I created a Maven project and then inserted the dependency of spark into pom.xml file. That was how I just made it work. Otherwise it was impossible. – JPerk Sep 29 '15 at 21:37
  • Yes.. You can do that also.. That is for using the java api. And you will have to do the above method for configuring your spark master other than `local` :) – karthik manchala Sep 30 '15 at 04:05
  • What exactly should be downloaded in the 1st step please? :) – gsamaras Feb 10 '16 at 12:47
  • 2
    @gsamaras choose the latest spark version.. and if you dont want to build spark explicitly, you can choose to download the prebuilt version of spark with hadoop (does not require installing hadoop) – karthik manchala Feb 10 '16 at 12:51
  • Thank you @karthikmanchala. I am however experiencing a problem: http://askubuntu.com/questions/732050/unable-to-launch-spark – gsamaras Feb 10 '16 at 13:32
  • Yes @karthikmanchala, new error is here: http://stackoverflow.com/questions/35318343/sbt-errors-in-spark – gsamaras Feb 10 '16 at 14:51
6

The process to follow is mainly this:

Make sure you have version 7 or 8 of the Java Development Kit installed

In next step install Scala.

And then add following in the end of the ~/.bashrc file

export SCALA_HOME=<path to Scala home>
export PATH=$SCALA_HOME/bin:$PATH

restart bashrc.

$ . .bashrc

In next step install git. Spark build depends git.

sudo apt-get install git

Finally download spark distribution from here

$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.0.tgz
$ tar xvf spark-1.4.0.tgz 

Building

SBT(Simple Build Tool) is used for building Spark, which is bundled with it. To compile the code

$ cd spark-1.4.0
$ build/sbt assembly

Building take some time.

Refer this blog post, here you can find more detailed steps to install Apache Spark on Ubuntu-14.04

prabeesh
  • 935
  • 9
  • 11
5

This post explains detailed steps to set up Apache Spark-2.0 in Ubuntu/Linux machine. For running Spark in Ubuntu machine should have Java and Scala installed. Spark can be installed with or without Hadoop, here in this post we will be dealing with only installing Spark 2.0 Standalone. Installing Spark-2.0 over Hadoop is explained in another post. We will also be doing how to install Jupyter notebooks for running Spark applications using Python with pyspark module. So, let’s start by checking and installing java and scala.

$ scala -version
$ java –version

These commands should print you the versions if scala and java is already installed else you can go to installing these by using following commands.

$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
$ wget http://www.scala-lang.org/files/archive/scala-2.10.4.tgz
$ sudo mkdir /usr/local/src/scala
$ sudo tar xvf scala-2.10.4.tgz -C /usr/local/scala/

You can again check by using –version commands if java and scala is installed properly which will display – Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL and for java it should display java version "1.8.0_101" Java(TM) SE Runtime Environment (build 1.8.0_101-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.101-b14, mixed mode) And update the .bashrc file by adding these lines at the end.

export SCALA_HOME=/usr/local/scala/scala-2.10.4
export PATH=$SCALA_HOME/bin:$PATH

And restart bashrc by using this command

$ . .bashrc

Installing Spark First Download Spark from https://spark.apache.org/downloads.html using these options Spark Realease : 2.0.0 Package Type: prebuilt with Hadoop 2.7 and Direct download.

Now, got to $HOME/Downloads and use following command to extract the spark tar file and move to the given location.

$ `tar xvf spark-1.3.1-bin-hadoop2.6.tgz`
$ `cd $HOME/Downloads/` 
$ mv spark-2.0.0-bin-hadoop2.7 /usr/local/spark

Add the following line to ~/.bashrc file. It means adding the location, where the spark software file are located to the PATH variable.

export SPARK_HOME=/usr/local/spark
export PATH =$SPARK_HOME/bin:$PATH

Again restart the environment .bashrc by using these commands source ~/.bashrc or

. .bashrc

Now you can start spark-shell by using these commands

$spark-shell    for starting scala API
$ pyspark       for starting Python API
Abir J.
  • 51
  • 1
  • 2
0

You can start by going to http://spark.apache.org/downloads.html to download Apache Spark. If you don't have an existing Hadoop cluster/installation you need to run against you can select any of the options. This will give you a .tgz file you can extract with tar -xvf [filename]. From there you can launch the spark shell and get started in local mode. There is more information in the getting started guide at http://spark.apache.org/docs/latest/ .

Holden
  • 7,392
  • 1
  • 27
  • 33
0

I made it work by creating a Maven project and then inserted the dependency of spark into the pom.xml file. That was how it just worked for me, because I had to program with Java and not Scala.

JPerk
  • 325
  • 1
  • 2
  • 12