I am new to Spark & learning about the Dataframe,operations & architecture. While reading about the comparison between RDD and Dataframe, i got confused with the data structure of both RDD and Dataframe. Below are my observation, Please help to clarify/correct it if it is wrong
1)RDD is stored in the computer RAM in a distributed manner(blocks) across the nodes in a cluster,if the source data is an a cluster(eg: HDFS).
If the data source is just a single CSV file, the data will be distributed to multiple blocks in the RAM of running server(if Laptop). Am i right?
2)Is there any relationship between block and partition? Which one is super set?
3)Dataframe: Does the Dataframe also getting stored in the same way as RDD? Whether RDD will be created in the backed if i am storing my source data into dataframe alone?
Thanks in advance :)