Spark 2.x DataFrames or Datasets?

Question

My understanding that one of the big changes between Spark 1.x and 2.x was the migration away from DataFrames to the adoption of newer/improved Dataset objects.

However in all the Spark 2.x docs I see DataFrames being used, not Datasets.

So I ask: In Spark 2.x are we still using DataFrames, or have the Spark folks just not updated there 2.x docs to use the newer + recommended Datasets?

I see this question of being more confirmation of which to use and not so much about their differences. — hotmeatballsoup, May 10 '18 at 16:49
[Spark 2.0 Dataset vs DataFrame](https://stackoverflow.com/q/40596638/6910411) — zero323, May 13 '18 at 17:09

score 0 · Answer 1 · answered May 10 '18 at 15:51

0

DataFrames ARE Datasets, just a special type of Datasets, namely Dataset[Row], meaning untyped Datasets.

But it's true that even with Spark 2.x, many Spark users still use DataFrames, especially for fast prototyping (I'm one of them), because it's a very convenient API and many operations are (in my view) easier to do with DataFrames than with Datasets

answered May 10 '18 at 15:51

Raphael Roth

26,751
15
88
145

Ahh ok, so Spark just hasn't updated their docs then? – hotmeatballsoup May 10 '18 at 16:15
2

No. `DataFrame` is a specific, highly optimized variant of `Dataset` providing additional set of features, over generic `Dataset`. It is not deprecated or obsolete, so there is no reason to update the docs. – Alper t. Turker May 10 '18 at 17:17
Oh nice, OK are there any Spark docs/examples off the main site that show how to create or work with `Datasets`? – hotmeatballsoup May 10 '18 at 18:36
No? Weird. That's probably an enormous oversight. – hotmeatballsoup May 11 '18 at 12:46

score -1 · Accepted Answer · answered May 11 '18 at 12:47

-1

Apparently you can use both but no one over at Spark has bothered updating the docs to show how to use Datasets so I'm guessing they really want us to just use DataFrames like we did in 1.x.

answered May 11 '18 at 12:47

hotmeatballsoup

385
6
58
136

Spark 2.x DataFrames or Datasets?

2 Answers2