3

I've been using Apache Spark to write a desktop app which lets you tamper data interactively. I've recently started reading "Learning Spark" and in that the author says that in local mode (when master is set to local) Spark only uses one Thread.

How can I take advantage of all of the cores in my computer without having a full-blown Spark cluster on my computer?

I'm using Java / Kotlin.

Adam Arold
  • 29,285
  • 22
  • 112
  • 207

1 Answers1

2

It defaults to one, but you can specify how many you'd like like so:

 val config = SparkConfig()
 config.setMaster("local[8]") // local, using 8 threads (you can vary the number)
 config.setAppName("qwerty")
 val context = SparkContext(config)
hudsonb
  • 2,214
  • 19
  • 20
  • That's cool. And how many is recommended per core? Does this mean that just by adding Spark more threads my program will get automatically faster without touching the code? – Adam Arold Apr 23 '18 at 15:56
  • 1
    I've always used a 1/1 (8 core machine, I set it to 8 spark "cores") but I can't recall anything specific as to why I do it that way. If you were previously just using one I would think you'd see at least some boost in performance; how much likely depends on the use case. – hudsonb Apr 23 '18 at 16:00
  • @AdamArold _Does this mean that just by adding Spark more threads my program will get automatically faster without touching the code_ - [not really](https://stackoverflow.com/q/41090127/6910411). In general (depending on the data, code and overall configuration) it will keep getting slower and slower :) – zero323 Apr 23 '18 at 18:22
  • I see. I'm only doing row-based transformations on my `Dataset`s like `replace`, `toLowerCase`, stuff like that. Do I need to touch my code in order to get a speedup? – Adam Arold Apr 24 '18 at 08:30
  • Update: I just tested it with `2`, `4` and `8` but it actually slowed down compared to the simple `local` option. – Adam Arold Apr 24 '18 at 12:37