I'm wondering how to eliminate cache of the specified spark dataframe.
For example,
sdf = spark.read.table('example)
sdf.count() # -> And the sdf will be cached in memory
After the sdf.count()
sdf is stored in memory. I'd like to remove it from memory to make room.
I know two possible candidates, which don't work to solve the question above.
spark.catalog.clearCache()
method: this clears all the in-memory cache of the tables.sdf.unpersist()
: This only works after such codes assdf.repartition(200).persist()
andsdf.count().
In addition, I've got to use Spark v2.4.0 due to some constraints of my working environment.
Would anyone please tell me how to achieve above?