3

The off-heap memory usage of the 3 Spark executor processes keeps increasing constantly until the boundaries of the physical RAM are hit. This happened two weeks ago, at which point the system comes to a grinding halt, because it's unable to spawn new processes. At such a moment restarting Spark is the obvious solution. In the collectd memory usage graph below we see two moments that we've restarted Spark: last week when we upgraded Spark from 1.4.1 to 1.5.1 and two weeks ago when the physical memory was exhausted.

collectd memory usage of Spark box

As can be seen below, the Spark executor process uses approx. 62GB of memory, while the heap size max is set to 20GB. This means the off-heap memory usage is approx. 42GB.

$ ps aux | grep 40724
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
apache-+ 40724  140 47.1 75678780 62181644 ?   Sl   Nov06 11782:27 /usr/lib/jvm/java-7-oracle/jre/bin/java -cp /opt/spark-1.5.1-bin-hadoop2.4/conf/:/opt/spark-1.5.1-bin-hadoop2.4/lib/spark-assembly-1.5.1-hadoop2.4.0.jar:/opt/spark-1.5.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.5.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.5.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar -Xms20480M -Xmx20480M -Dspark.driver.port=7201 -Dspark.blockManager.port=7206 -Dspark.executor.port=7202 -Dspark.broadcast.port=7204 -Dspark.fileserver.port=7203 -Dspark.replClassServer.port=7205 -XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@xxx.xxx.xxx.xxx:7201/user/CoarseGrainedScheduler --executor-id 2 --hostname xxx.xxx.xxx.xxx --cores 10 --app-id app-20151106125547-0000 --worker-url akka.tcp://sparkWorker@xxx.xxx.xxx.xxx:7200/user/Worker
$ sudo -u apache-spark jps
40724 CoarseGrainedExecutorBackend
40517 Worker
30664 Jps
$ sudo -u apache-spark jstat -gc 40724
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC     PU    YGC     YGCT    FGC    FGCT     GCT   
158720.0 157184.0 110339.8  0.0   6674944.0 1708036.1 13981184.0 2733206.2  59904.0 59551.9  41944 1737.864  39     13.464 1751.328
$ sudo -u apache-spark jps -v
40724 CoarseGrainedExecutorBackend -Xms20480M -Xmx20480M -Dspark.driver.port=7201 -Dspark.blockManager.port=7206 -Dspark.executor.port=7202 -Dspark.broadcast.port=7204 -Dspark.fileserver.port=7203 -Dspark.replClassServer.port=7205 -XX:MaxPermSize=256m
40517 Worker -Xms2048m -Xmx2048m -XX:MaxPermSize=256m
10693 Jps -Dapplication.home=/usr/lib/jvm/java-7-oracle -Xms8m

Some info:

  • We use Spark Streaming lib.
  • Our code is written in Java.
  • We run Oracle Java v1.7.0_76
  • Data is read from Kafka (Kafka runs on different boxes).
  • Data is written to Cassandra (Cassandra runs on different boxes).
  • 1 Spark master and 3 Spark executors/workers, running on 4 separate boxes.
  • We recently upgraded Spark to 1.4.1 and 1.5.1 and the memory usage pattern is identical on all those versions.

What can be the cause of this ever-increasing off-heap memory use?

Balthasar
  • 63
  • 1
  • 8

0 Answers0