Clear cache of the specified table in pyspark

Asked Aug 24 '20 at 10:20

Active Aug 24 '20 at 12:22

Viewed 950 times

I'm wondering how to eliminate cache of the specified spark dataframe.

For example,

sdf = spark.read.table('example)
sdf.count() # -> And the sdf will be cached in memory

After the sdf.count() sdf is stored in memory. I'd like to remove it from memory to make room.

I know two possible candidates, which don't work to solve the question above.

spark.catalog.clearCache() method: this clears all the in-memory cache of the tables.
sdf.unpersist() : This only works after such codes as sdf.repartition(200).persist() and sdf.count().

In addition, I've got to use Spark v2.4.0 due to some constraints of my working environment.

Would anyone please tell me how to achieve above?

edited Aug 24 '20 at 12:22

mazaneicha

asked Aug 24 '20 at 10:20

Yosher

is it helpful? .. https://stackoverflow.com/questions/32218769/drop-spark-dataframe-from-cache – kavetiraviteja Aug 24 '20 at 10:46

0 Answers0