I am using Spark to run a C++ binary that uses a lot of memory.
I am running it like this:
rdd.map(lambda x: subprocess.check_call(["./high_memory_usage_executable"]))
I have been getting -9
return code which means that the OS is killing the executable due to out of memory.
I have set:
spark.mesos.executor.memoryOverhead=20480
The executable should need around 10g (10240) of memory. So the memoryOverhead setting is very generous.
Question:
- How do I profile the memory usage and the executable and debug why it is failing.
- I worry that there are multiple
./high_memory_usage_executable
running per executor. How can I force to have only one executor run eachhigh_memory_usage_executable