2

I need to install spark on a single machine that is running Ubuntu 14.04 I need this mainly for educational purposes so I am not very interested in high performance.

I don't have enough knowledge to follow the tutorial http://spark.apache.org/docs/1.2.0/spark-standalone.html and I do not understand which version of Spark I should install.

Can someone explain me how to setup step by step on my machine a working Spark system?

EDIT: Following the comments and the current answer I am able to run the spark console and to use it.

    donbeo@donbeo-HP-EliteBook-Folio-9470m:~/Applications/spark/spark-1.1.0$ ./bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/02/04 10:20:20 INFO SecurityManager: Changing view acls to: donbeo,
15/02/04 10:20:20 INFO SecurityManager: Changing modify acls to: donbeo,
15/02/04 10:20:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(donbeo, ); users with modify permissions: Set(donbeo, )
15/02/04 10:20:20 INFO HttpServer: Starting HTTP Server
15/02/04 10:20:20 INFO Utils: Successfully started service 'HTTP class server' on port 48135.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.1.0
      /_/

Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_75)
Type in expressions to have them evaluated.
Type :help for more information.
15/02/04 10:20:23 WARN Utils: Your hostname, donbeo-HP-EliteBook-Folio-9470m resolves to a loopback address: 127.0.1.1; using 192.168.1.45 instead (on interface wlan0)
15/02/04 10:20:23 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/02/04 10:20:23 INFO SecurityManager: Changing view acls to: donbeo,
15/02/04 10:20:23 INFO SecurityManager: Changing modify acls to: donbeo,
15/02/04 10:20:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(donbeo, ); users with modify permissions: Set(donbeo, )
15/02/04 10:20:23 INFO Slf4jLogger: Slf4jLogger started
15/02/04 10:20:23 INFO Remoting: Starting remoting
15/02/04 10:20:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.45:34171]
15/02/04 10:20:23 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@192.168.1.45:34171]
15/02/04 10:20:23 INFO Utils: Successfully started service 'sparkDriver' on port 34171.
15/02/04 10:20:23 INFO SparkEnv: Registering MapOutputTracker
15/02/04 10:20:23 INFO SparkEnv: Registering BlockManagerMaster
15/02/04 10:20:24 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150204102024-1e7b
15/02/04 10:20:24 INFO Utils: Successfully started service 'Connection manager for block manager' on port 44926.
15/02/04 10:20:24 INFO ConnectionManager: Bound socket to port 44926 with id = ConnectionManagerId(192.168.1.45,44926)
15/02/04 10:20:24 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/02/04 10:20:24 INFO BlockManagerMaster: Trying to register BlockManager
15/02/04 10:20:24 INFO BlockManagerMasterActor: Registering block manager 192.168.1.45:44926 with 265.4 MB RAM
15/02/04 10:20:24 INFO BlockManagerMaster: Registered BlockManager
15/02/04 10:20:24 INFO HttpFileServer: HTTP File server directory is /tmp/spark-58772693-4106-4ff0-a333-6512bcfff504
15/02/04 10:20:24 INFO HttpServer: Starting HTTP Server
15/02/04 10:20:24 INFO Utils: Successfully started service 'HTTP file server' on port 51677.
15/02/04 10:20:24 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/02/04 10:20:24 INFO SparkUI: Started SparkUI at http://192.168.1.45:4040
15/02/04 10:20:24 INFO Executor: Using REPL class URI: http://192.168.1.45:48135
15/02/04 10:20:24 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@192.168.1.45:34171/user/HeartbeatReceiver
15/02/04 10:20:24 INFO SparkILoop: Created spark context..
Spark context available as sc.

scala> val x = 3
x: Int = 3

scala> 

Now suppose I want to use spark in a scala file like for example

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "YOUR_SPARK_HOME/README.md" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

How can I do that?

Donbeo
  • 17,067
  • 37
  • 114
  • 188
  • There is a step-by-step guide to installation [here](http://blog.prabeeshk.com/blog/2014/10/31/install-apache-spark-on-ubuntu-14-dot-04/), if that helps. What part(s) are you unsure of? – DNA Feb 03 '15 at 22:03
  • @DNA I am able to run the guides and to use spark shell from terminal. Now how can I use spark on a new scala project? – Donbeo Feb 04 '15 at 09:45
  • See the [programming guide](https://spark.apache.org/docs/latest/programming-guide.html), and the examples included with Spark, linked from the [Where to go from here](https://spark.apache.org/docs/latest/programming-guide.html#where-to-go-from-here) section. (If you are unsure how to run a basic Scala program, you first need to study a tutorial on that before attempting to write Spark jobs). – DNA Feb 04 '15 at 10:36
  • I have a basic knowledge of scala and I am able to run scala programs. But I do not understand how to do that with spark. I am able to run simple scala programs or the examples in spark but I am not able to run a new example . (For example if I copy SparkPi in a new file SparkPi2 how can I run it ? ) – Donbeo Feb 04 '15 at 12:07
  • The question seems to be more complex than expected. I will accept the answer and ask a new question on how to submit tasks. – Donbeo Feb 04 '15 at 14:07

1 Answers1

4

If you're just planning to run it on a single machine for learning, etc., then you can use the local (1 core) or local[*] (all cores) value for the "master". Then it runs just like a normal JVM process, even in an IDE, debugger, etc.. I wrote a do-it-yourself workshop that works this way, https://github.com/deanwampler/spark-workshop, if you need an example.

If local is sufficient, one of the binary downloads will have what you need.

Dean Wampler
  • 2,141
  • 13
  • 10
  • I am able to run your tutorial with sbt or activator. But how can I start a new spark project? – Donbeo Feb 04 '15 at 06:13
  • If you like the general approach, you could copy the project and delete the source files and scripts you don't want, then modify the remaining files to create your new app. If you want to run your app (or my examples) using Spark's own `spark-submit` script, use the sbt/activator shell command `package` to create a jar file, then follow the instructions for using it with `spark-submit`. Note that you'll need to download a Spark distro. to get that scripts, etc. – Dean Wampler Feb 04 '15 at 14:23
  • thanks but I am having still problems. I have asked a more specific question here http://stackoverflow.com/questions/28324984/submit-task-to-spark – Donbeo Feb 04 '15 at 15:24