How to access java runtime variables like java.lang.Runtime.getRuntime().maxMemory() for pyspark executors?

Question

The question is all there is. I want a way to check the java runtime variables for the executor jvm created but I am working with pyspark. How can I access java.lang.Runtime.getRuntime().maxMemory() if I am working with pyspark?

based on the comment I have tried to run the following code but both approaches are unsuccessful

#created a RDD
l = sc.range(100)

Now, I have to run func = sc._gateway.jvm.java.lang.Runtime.getRuntime().maxMemory() on each executor. So, I do the following

l.map(lambda x:sc._gateway.jvm.java.lang.Runtime.getRuntime().maxMemory()).collect()

Which results in

Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

The spark context can only be used on the driver

I also tried

func = sc._gateway.jvm.java.lang.Runtime.getRuntime()
l.map(lambda x:func.maxMemory()).collect()

which results in the following error

TypeError: cannot pickle '_thread.RLock' object

You may want to take a look at this answer: https://stackoverflow.com/a/35725213/5761558. I was able to do something like `func = sc._gateway.jvm.java.lang.Runtime.getRuntime()` followed by `func.maxMemory()` and that returned the max memory. You just need to orchestrate that so it runs on workers (maybe using a udf or other rdd distributed calls) — ernest_k, Dec 20 '22 at 11:37
I'm finding it difficult to run that. Modified the question describing the bottlenecks — figs_and_nuts, Dec 20 '22 at 18:11

How to access java runtime variables like java.lang.Runtime.getRuntime().maxMemory() for pyspark executors?

0 Answers0