6

I need to install spark and run it in standalone mode on one machine and looking for a straight forward way to install it via apt-get

I found how to do this with pyspark via pip here

I can not find any way to install spark with one terminal command.

Does installing pyspark install all the spark software?

I found instructions for how to install spark, and it's more complex

Is there a way to install spark with a similar one terminal command via apt-get?

EDIT

I found this explaining how to install spark using apt-get and i get the following error

E: Unable to locate package spark-core E: Unable to locate package spark-master E: Unable to locate package spark-worker E: Unable to locate package spark-history-server E: Unable to locate package spark-python

Thanks

thebeancounter
  • 4,261
  • 8
  • 61
  • 109
  • Check this link http://blog.prabeeshk.com/blog/2014/10/31/install-apache-spark-on-ubuntu-14-dot-04/ – Víctor López Jul 24 '17 at 09:35
  • thanks @VíctorLópez , but this is not via apt-get and does not explain if pyspark is a substitute for spark? – thebeancounter Jul 24 '17 at 10:32
  • Could you please post the output of `sudo apt-get install spark` command? I just executed it on my Linux Mint (based od Ubuntu) and it worked without a problem. – kchomski Jul 24 '17 at 13:28
  • @kchomski see edit – thebeancounter Jul 24 '17 at 13:46
  • @kchomski `sudo apt-get install spark` will install the "SPARK programming language toolset" based on the Ada programming language (see https://packages.ubuntu.com/artful/devel/spark). This is not(!) "Apache Spark". – asmaier Sep 06 '17 at 12:16

1 Answers1

3

Before installing pyspark you must install Java8. This is described at

For a fully automatic installation of Java8 on Ubuntu do

$ apt-get update
$ apt-get -y install software-properties-common
$ add-apt-repository -y ppa:webupd8team/java
$ echo debconf shared/accepted-oracle-license-v1-1 select true | debconf-set-selections
$ echo debconf shared/accepted-oracle-license-v1-1 seen true | debconf-set-selections
$ apt-get update
$ apt-get -y install oracle-java8-installer

(see https://newfivefour.com/docker-java8-auto-install.html)

Afterwards you can simply run pip install pyspark.

asmaier
  • 11,132
  • 11
  • 76
  • 103
  • could you explain how to download a complete spark via terminal? this one i mean https://spark.apache.org/downloads.html – thebeancounter Sep 07 '17 at 11:50
  • You don't need to download spark if you are using `pip install pyspark`. – asmaier Sep 07 '17 at 12:08
  • 1
    you can not use the full functionality of spark with pyspark, for instance, you can't start a master or a slave on your machine using pyspark, you can only use an existing spark cluster that already been set on another machine – thebeancounter Sep 07 '17 at 12:14
  • 1
    I was able to use pyspark using `import pyspark; sc=pyspark.SparkContext()`. This will by default use the master-url `local[*]` . (see https://spark.apache.org/docs/latest/submitting-applications.html#master-urls) . I didn't need to setup a spark cluster to be able to do that. – asmaier Sep 07 '17 at 12:21
  • 1
    In case you still want to download full spark via terminal, do `curl -LJO "https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&filename=spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz"` – asmaier Sep 08 '17 at 16:01