on my macbook pro with 8 GB Ram,
using jdk 1.8.0_20 and scala 2.10.4
with spark 1.0.2 in a standalone scala app i tried a simple line count on a file which is ~ 800MB of size.
i just hit spark/sbin/start-all.sh, configuration only for IP, and expect the app to work.
$ sbt run
It connects, executes, dies due to OutOfMemoryExceptions.
Now the annoting part is: using spark/bin/spark-submit or the spark-shell, executing the same code leads to a valid result without any memory exceptions. I do as follow:
$ sbt package
$ ./bin/spark-submit --class Application \
--master spark://192.168.188.25:7077/ \
sparktest_2.10-1.0.jar
and get the correct output.
Source:
object Application {
def main(args: Array[String]) {
val conf = new SparkConf()
.setAppName("Sparkling")
.setMaster("spark://192.168.188.25:7077")
val sc = new SparkContext(conf)
val xml = sc.textFile("demo.xml")
println("partitions: "+ xml.partitions.length) // 26
println("length: " + xml.count) // 21858279
sc.stop()
}
}
Any Ideas? Except for using more partitions, does not help :/
Information i can provide? memory-flags are all default.