1

I know that I should use Spark Datasets primarily, however I am wondering if there are good situations where I should use RDDs instead of Datasets?

SCouto
  • 7,808
  • 5
  • 32
  • 49
jk1
  • 593
  • 6
  • 16

1 Answers1

2

In a common Spark application you should go for the Dataset/Dataframe. Spark internally optimize those structure and they provide you high level APIs to manipulate the data. However there are situation when RDD are handy:

  • When manipulating graphs using GraphX
  • When integration with 3rd party libraries that only know how to handle RDD
  • When you want to use low level API to have a better control over your workflow (e.g reduceByKey, aggregateByKey)
dumitru
  • 2,068
  • 14
  • 23