1

I am using Spark to run a C++ binary that uses a lot of memory.

I am running it like this:

rdd.map(lambda x: subprocess.check_call(["./high_memory_usage_executable"]))

I have been getting -9 return code which means that the OS is killing the executable due to out of memory.

I have set:

spark.mesos.executor.memoryOverhead=20480 The executable should need around 10g (10240) of memory. So the memoryOverhead setting is very generous.

Question:

  1. How do I profile the memory usage and the executable and debug why it is failing.
  2. I worry that there are multiple ./high_memory_usage_executable running per executor. How can I force to have only one executor run each high_memory_usage_executable
samol
  • 18,950
  • 32
  • 88
  • 127

0 Answers0