Effective Memory Management in Spark?

Question

Is there a defined standard for effective memory management in Spark

What if I end up creating a couple of DataFrames or RDDs and then keep on reducing that data with joins and aggregations??

Will these DataFrames or RDDs will still be holding resources until the session or job is complete??

score 0 · Accepted Answer · answered Feb 27 '19 at 18:19

No there is not. The lifetime of the main entity in Spark which is the RDD is defined via its lineage. When the your job makes a call to an action then the whole DAG will start getting executed. If the job was executed successfully Spark will release all reserved resources otherwise will try to re-execute the tasks that failed and reconstructing the lost RDDs based on its lineage.

Please check the following resources to get familiar with these concepts:

What is Lineage In Spark?

What is the difference between RDD Lineage Graph and Directed Acyclic Graph (DAG) in Spark?

Effective Memory Management in Spark?

1 Answers1