So I have a question about RDD's persistence. Let's say I have an RDD
that's persisted MEMORY_AND_DISK
, and I know that I now have enough memory space cleared up that I can force the data on disk into memory. Is it possible to tell spark to re-evaluate the open RDD
memory and move that information?
Essentially I'm running into an issue with my RDD
where I persist it and the entire RDD
doesn't end up in memory until I query the RDD
multiple times. This makes the first few runs extremely slow. One thing I'm hoping to try is to initially set the RDD
to MEMORY_AND_DISK
and then force the disk data back into memory.