spark standalone application out of memory on small text files

Question

on my macbook pro with 8 GB Ram,

using jdk 1.8.0_20 and scala 2.10.4

with spark 1.0.2 in a standalone scala app i tried a simple line count on a file which is ~ 800MB of size.

i just hit spark/sbin/start-all.sh, configuration only for IP, and expect the app to work.

$ sbt run

It connects, executes, dies due to OutOfMemoryExceptions.

Now the annoting part is: using spark/bin/spark-submit or the spark-shell, executing the same code leads to a valid result without any memory exceptions. I do as follow:

$ sbt package
$ ./bin/spark-submit --class Application \
                     --master spark://192.168.188.25:7077/ \
                      sparktest_2.10-1.0.jar

and get the correct output.

Source:

object Application {
  def main(args: Array[String]) {
    val conf = new SparkConf()
      .setAppName("Sparkling")
      .setMaster("spark://192.168.188.25:7077")

    val sc = new SparkContext(conf)

    val xml = sc.textFile("demo.xml")
    println("partitions: "+ xml.partitions.length) // 26
    println("length: " + xml.count) // 21858279

    sc.stop()
  }
}

Any Ideas? Except for using more partitions, does not help :/

Information i can provide? memory-flags are all default.

Maybe http://stackoverflow.com/questions/3868863/how-to-specify-jvm-maximum-heap-size-xmx-for-running-an-application-with-run or http://stackoverflow.com/questions/16640823/sbt-runs-out-of-memory could work? — Naetmul, Aug 10 '14 at 15:05

score 0 · Answer 1 · answered Aug 12 '14 at 10:00

(Please provide full OOM ST, also please tell us when it OOMs - does it OOM before spark job submission or during job execution. More logs and description are needed to say for certain what's going on)

Seems like your driver is OOMing due to different default memory settings when starting sbt vs starting spark-shell / using spark-submit. I don't think it's your job that is OOMing because as you claim, it runs just fine if submitted in another way. Therefore if you fiddle around with the memory settings of sbt it could work. So I don't think this is a Spark problem, rather an sbt problem, so following the links provided by Naetmul in the comment should fix.

Alternatively, and this is the way I usually run my jobs, is to not use sbt run, but build the jar and use java -cp your.jar.

spark standalone application out of memory on small text files

1 Answers1