1

RDD Provides Fault Tolerance Through Lineage Graph, this is how Spark becomes fault tolerant

So while working with spark data frames does spark create RDDs in the background to become fault tolerant?

In general, if I perform any activity/transformation on spark clusters, does spark uses RDD?

1 Answers1

1

RDDs are the back-bone of Spark and the fundamental data structure. Dataframes and datasets are built over RDDs and are meant to provide an abstraction for simplicity.

Think of RDDs as similar to Scala collection but distributed in Nature.

While, Dataframes as an RDD with Schema (in fact, Dataframes are evolved from SchemaRDD) i.e. a Two-dimensional collection distributed in nature. Under the hood, Spark does create RDDs.

If you're interested in learning more about it do consider reading these - https://www.databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html

Difference between DataFrame, Dataset, and RDD in Spark

Also, you can see the underlying RDDs in Spark UI (Task DAG)

[RDDs in SparkUI1

Ronak Jain
  • 3,073
  • 1
  • 11
  • 17