0

My understanding that one of the big changes between Spark 1.x and 2.x was the migration away from DataFrames to the adoption of newer/improved Dataset objects.

However in all the Spark 2.x docs I see DataFrames being used, not Datasets.

So I ask: In Spark 2.x are we still using DataFrames, or have the Spark folks just not updated there 2.x docs to use the newer + recommended Datasets?

zero323
  • 322,348
  • 103
  • 959
  • 935
hotmeatballsoup
  • 385
  • 6
  • 58
  • 136

2 Answers2

0

DataFrames ARE Datasets, just a special type of Datasets, namely Dataset[Row], meaning untyped Datasets.

But it's true that even with Spark 2.x, many Spark users still use DataFrames, especially for fast prototyping (I'm one of them), because it's a very convenient API and many operations are (in my view) easier to do with DataFrames than with Datasets

Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
-1

Apparently you can use both but no one over at Spark has bothered updating the docs to show how to use Datasets so I'm guessing they really want us to just use DataFrames like we did in 1.x.

hotmeatballsoup
  • 385
  • 6
  • 58
  • 136