I am a Spark beginner!And,I'm confused about the relationship between Spark rdd and Spark sql . Whether Spark sql is supposed to converted to Spark rdd in the background?
Asked
Active
Viewed 668 times
1
-
please refer to the programming guide: http://spark.apache.org/docs/latest/programming-guide.html – mtoto Oct 11 '16 at 12:59
-
Possible duplicate of [Difference between DataFrame and RDD in Spark](http://stackoverflow.com/questions/31508083/difference-between-dataframe-and-rdd-in-spark) – Oct 11 '16 at 13:17
1 Answers
1
As far as I know, they are sitting atop different engines.
Spark SQL leverages an internal thing called Catalyst which is responsible for generating logical plans for the work and doing performance optimization in relation to codegen.
First, because DataFrame and Dataset APIs are built on top of the Spark SQL engine, it uses Catalyst to generate an optimized logical and physical query plan.
The RDD api on the other hand, is low level, and apparently does not leverage catalyst.

Kristian
- 21,204
- 19
- 101
- 176