I know that I should use Spark Datasets primarily, however I am wondering if there are good situations where I should use RDD
s instead of Datasets?
Asked
Active
Viewed 54 times
1
1 Answers
2
In a common Spark application you should go for the Dataset/Dataframe. Spark internally optimize those structure and they provide you high level APIs to manipulate the data. However there are situation when RDD are handy:
- When manipulating graphs using GraphX
- When integration with 3rd party libraries that only know how to handle RDD
- When you want to use low level API to have a better control over your workflow (e.g
reduceByKey
,aggregateByKey
)

dumitru
- 2,068
- 14
- 23