How to control how many tasks to run per executor in PySpark

Asked Apr 12 '18 at 01:27

Active Apr 12 '18 at 01:27

Viewed 28 times

I am using Spark to run a C++ binary that uses a lot of memory.

I am running it like this:

rdd.map(lambda x: subprocess.check_call(["./high_memory_usage_executable"]))

I have been getting -9 return code which means that the OS is killing the executable due to out of memory.

I have set:

spark.mesos.executor.memoryOverhead=20480 The executable should need around 10g (10240) of memory. So the memoryOverhead setting is very generous.

Question:

How do I profile the memory usage and the executable and debug why it is failing.
I worry that there are multiple ./high_memory_usage_executable running per executor. How can I force to have only one executor run each high_memory_usage_executable

asked Apr 12 '18 at 01:27

samol

0 Answers0