How to permanently save an object in memory with Spark?

Question

I use Spark (in java) to create a RDD of complex object. Is it possible to save permently this object in memory to use again this object with spark in the future ?

(Because Spark after a application or a job clean memory)

score 3 · Accepted Answer · edited May 23 '17 at 10:28

3

Spark is not intended as a permanent storage, you can use HDFS, ElasticSearch or another 'Spark compatible' cluster storage for this.

Spark reads data from a cluster storage, does some work in random access memory RAM (and optional caching of temp results), then usually writes results back to cluster storage because there may be too many results for the local hard drive.

Example: Read from HDFS -> Spark ... RDD ... -> Store results in HDFS

You must distinguish between slow storage like harddrives (disk, SSD) and fast volatile memory like RAM. The strength of Spark is making heavy use of random access memory (RAM).

You may use caching, for a temporary storage, see: (Why) do we need to call cache or persist on a RDD

edited May 23 '17 at 10:28

Community

1
1

answered Jul 07 '16 at 08:06

Christophe Roussy

16,299
4
85
85

I understand but database are not adapt to store object... So the best solution will be use hdfs to have data in disk and an other database in memory like Tachyon or Redis to benefit speed when spark read data and don't keep object format ? – TiGi Jul 07 '16 at 08:47
HDFS works well with Spark, often you do HDFS -> Spark -> HDFS, the thing is you must use something compatible with Spark and it should be able to take large amounts for data, but maybe your Spark output is not as big as the input so this is not always a requirement. – Christophe Roussy Jul 07 '16 at 09:01

How to permanently save an object in memory with Spark?

1 Answers1