-3

In spark,there always be operation like this:

 hiveContext.sql("select * from demoTable").show()

When I look up the show() method in Spark Official API,the result is like this: enter image description here And when I change the key word to 'Dataset',I Find that the method used on DataFrame belongs to Dataset. How does it happen? Is there any implication?

lec_ssmi
  • 52
  • 5

1 Answers1

0

According to the documentation:

A Dataset is a distributed collection of data.

And:

A DataFrame is a Dataset organized into named columns.

So, technically: DataFrame is equivalent to Dataset<Row>

And one last quote:

In the Scala API, DataFrame is simply a type alias of Dataset[Row]. While, in Java API, users need to use Dataset to represent a DataFrame.

In short, a the concrete type is Dataset.

ernest_k
  • 44,416
  • 5
  • 53
  • 99