1.The default RDD eviction strategy is LRU.When memory space is not sufficient for RDD caching, several partitions will be evicted, if these partitions are used again latterly, they will be reproduced by the Lineage information and cached in memory again.
Cached datasets that do not fit in memory are either spilled to disk or recomputed on the fly when needed, as determined by the RDD's storage level
2.I haven't found anything about the relationship between the LRU and RDD storageLevel.However, you can use different storageLevel to cache data if doesn't fit into the memory.Also among different storageLevel, MEMORY_AND_DISK_SER can help cut down on GC and avoid expensive recomputations.
3.I don't think so there will be any issue if you're running spark on data that is larger than the sum of all executor memory or cluster size.Many operations can stream data through, and thus memory usage is independent of input data size.In few cases if the job fails or if an individual partition becomes too large to fit in memory than the usual approach would be to repartition to more partitions, so each one is smaller. Hopefully, then it would fit.