We recently started caching RDD that reused multiple times even if those RDD don't take a long time to compute.
According to the docs Spark will automatically evict the unused cached data using a LRU strategy.
So is there any drawback of overcaching RDDs? I was thinking that maybe that having all that deserialized data in memory could put more pressure on the GC but is this something that we should worry about?