changing persistence level of an RDD

Asked Mar 23 '16 at 23:22

Active Mar 23 '16 at 23:37

Viewed 219 times

So I have a question about RDD's persistence. Let's say I have an RDD that's persisted MEMORY_AND_DISK, and I know that I now have enough memory space cleared up that I can force the data on disk into memory. Is it possible to tell spark to re-evaluate the open RDD memory and move that information?

Essentially I'm running into an issue with my RDD where I persist it and the entire RDD doesn't end up in memory until I query the RDD multiple times. This makes the first few runs extremely slow. One thing I'm hoping to try is to initially set the RDD to MEMORY_AND_DISK and then force the disk data back into memory.

edited Mar 23 '16 at 23:37

Alberto Bonsanto

17,556
10
64
93

asked Mar 23 '16 at 23:22

Daniel Imberman

By the way, I recommend you to read this post first [Differences between cache and persist](http://stackoverflow.com/questions/28981359/why-do-we-need-to-call-cache-or-persist-on-a-rdd?answertab=active#tab-top) – Alberto Bonsanto Mar 23 '16 at 23:30
Thank you @AlbertoBonsanto. I've also added some more specific information as to the problem I'm running into to give a better idea as to why I'm asking this question. – Daniel Imberman Mar 23 '16 at 23:35
If you have enough memory available, then `MEMORY_AND_DISK` should not put any partitions onto disk in the first place. – Michael Mior May 23 '17 at 11:30

changing persistence level of an RDD

0 Answers0