0

So I have a question about RDD's persistence. Let's say I have an RDD that's persisted MEMORY_AND_DISK, and I know that I now have enough memory space cleared up that I can force the data on disk into memory. Is it possible to tell spark to re-evaluate the open RDD memory and move that information?

Essentially I'm running into an issue with my RDD where I persist it and the entire RDD doesn't end up in memory until I query the RDD multiple times. This makes the first few runs extremely slow. One thing I'm hoping to try is to initially set the RDD to MEMORY_AND_DISK and then force the disk data back into memory.

Alberto Bonsanto
  • 17,556
  • 10
  • 64
  • 93
Daniel Imberman
  • 618
  • 1
  • 5
  • 18
  • By the way, I recommend you to read this post first [Differences between cache and persist](http://stackoverflow.com/questions/28981359/why-do-we-need-to-call-cache-or-persist-on-a-rdd?answertab=active#tab-top) – Alberto Bonsanto Mar 23 '16 at 23:30
  • Thank you @AlbertoBonsanto. I've also added some more specific information as to the problem I'm running into to give a better idea as to why I'm asking this question. – Daniel Imberman Mar 23 '16 at 23:35
  • If you have enough memory available, then `MEMORY_AND_DISK` should not put any partitions onto disk in the first place. – Michael Mior May 23 '17 at 11:30

0 Answers0