Does spark tables keep data permanently stored as RDBMS does and data is available all the time?

Question

I'm quit new to Spark and was trying to understand it's functionality. Basically I'm from database background, and was confused with Spark databases & tables. So my confusion is does spark also stores data permanently on it's own and make it available all the time as RDBMS or other no-sql store does ? Or it just create a reference point to the incoming data till the duration of processing and once process is over data went off. SO basically how spark is being utilized where we've to process data on regularly in batches or in continuous streaming. What is the time to live for data in spark tables ?

If you're talking about tables you create using `createOrReplaceTempView`, those are tied to the spark session and get removed once the session ends. — philantrovert, Aug 08 '18 at 12:40
Spark supports the Hive Metastore for persistent storage, which is an RDBMS — OneCricketeer, Aug 08 '18 at 13:01

score 0 · Answer 1 · edited Aug 09 '18 at 17:44

Spark is not a database. It does not store data permanently by itself. Its a cluster computing framwork/engine which can also work in a standalone environment. What spark exactly does is it pulls the data from various sources like HDFS,S3,local filesystem,rdbms,nosql etc... and do any analysis or transformation in the memory(RAM) of various worker nodes. It has the capability to spill the data to local disk if the data does not fit in the RAM. Once action is finished the data will be flushed out. Though you can cache or persists and it will available till the spark context is running, sometimes even if you cache the data and the memory is full it calculates the LRU(least recently used) rdd and flush it out for storing other rdd. The memory management is an interesting concept in spark.

Does spark tables keep data permanently stored as RDBMS does and data is available all the time?

1 Answers1