14

I have build a Spark and Flink k-means application. My test case is a clustering on 1 million points on a 3 node cluster.

When in-memory bottlenecks begin, Flink starts to outsource to disk and work slowly but works. However, Spark lose executers if the memory is full and starts again (infinite loop?).

I try to customize the memory setting with the help from the mailing list here, thanks. But Spark does still not work.

Is it necessary to have any configurations to be set? I mean Flink works with low memory, Spark must also be able to; or not?

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
Pa Rö
  • 449
  • 1
  • 6
  • 18

1 Answers1

19

I am not a Spark expert (and I am an Flink contributor). As far as I know, Spark is not able to spill to disk if there is not enough main memory. This is one advantage of Flink over Spark. However, Spark announced a new project call "Tungsten" to enable managed memory similar to Flink. I don't know if this feature is already available: https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html

There are a couple of SO question about Spark out of memory problems (an Internet search with "spark out of memory" yield many results, too):

spark java.lang.OutOfMemoryError: Java heap space Spark runs out of memory when grouping by key Spark out of memory

Maybe one of those help.

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • 4
    Spark can serialize data to disk but requires parts of the data to be on the JVM's heap for certain operations. If the size of the heap is not sufficient, the job dies with a OutOfMemoryError. In contrast, Flink's engine does not accumulate lots of objects on the heap but stores them in a dedicated memory region. All operators are implemented in such a way that they can cope with very little memory and can spill to disk. This [blog post](http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html) discusses Flink's memory management and how it operates on binary data. – Fabian Hueske Aug 11 '15 at 07:57