-2

I am confused around 2 datatypes DataSet[Row] and sql.DataFrame. From various documents etc its mentioned that DataFrame is nothing but DataSet[Row]. Then what is sql.DataFrame. Below is the code where i see different type returned Can you please explain difference between these

Below code returns of type DataSet[Row] (as per return type of method in intellij)

serverDf.select(from_json(col("value"), schema) as "event")
  .select("*")
      .filter(col("event.type").isin(eventTypes_*))

Below code snippet returns of type sql.DataFrame

serverDf.select(from_json(col("value"), schema) as "event")
  .select("*")

Thanks in advance

Andronicus
  • 25,419
  • 17
  • 47
  • 88
Vindhya G
  • 1,339
  • 2
  • 21
  • 46
  • need full code to understand and explain also see [difference between dataset and dataframe](https://stackoverflow.com/a/39033308/3190018) – user3190018 Apr 28 '20 at 05:13
  • Quesion is more on sql.DataFrame vs DataSet[Row] . Seems like internally there are 2 types. i want to know the difference in that – Vindhya G Apr 28 '20 at 05:20
  • Not sure why this is marked as duplicate since this question is about type called sql.DataFrame in scala spark code rather than DataFrame vs DataSet[Row] – Vindhya G Apr 28 '20 at 05:25
  • sql.DataFrame and DataFrame both are same intellij prefixed with package name thats all. more over the same answer is available in another question if you see. – user3190018 Apr 28 '20 at 06:37

1 Answers1

1

The are the same thing, as it is stated in the documentation:

Each Dataset also has an untyped view called a DataFrame, which is a Dataset of Row.

It's just a type alias:

type DataFrame = Dataset[Row]

They might have different result types in intellij because of methods' different signatures.

Andronicus
  • 25,419
  • 17
  • 47
  • 88
  • I am talking about sql.DataFrame vs DataSet[Row] . why is there a type called sql.DataFrame is my question – Vindhya G Apr 28 '20 at 05:19
  • @Dhatri did you check my second link? It's a `sql` package, so the type is `sql.DataFrame`. – Andronicus Apr 28 '20 at 05:21
  • Question is about sql.DataFrame and DataSet[Row]. Not on Dataframe vs DataSet. There are 2 types shown in intellij. question was on that . And code is not considered in this answer.Have removed downvote though . But this is not the answer i need – Vindhya G Apr 28 '20 at 05:22
  • @Dhatri the signatures of the methods are different because of different type aliases, that's why it's sometimes as `Dataset` and `DataFrame` – Andronicus Apr 28 '20 at 05:25
  • Thanks for answering . So internally sql.DataFrame and DataSet[Row] are same in the implementation? Is there a reason why filter returns DataSet[Row] vs Select ? – Vindhya G Apr 28 '20 at 05:27
  • 1
    @Dhatri probably implementation details, but yes, they are the same, you can cast one to another – Andronicus Apr 28 '20 at 05:28
  • Thanks @Andronicus. So if i understand correctly in the context of Spark implementation its just a different package but same as DataSet[Row]? – Vindhya G Apr 28 '20 at 05:32
  • 1
    @Dhatri yes, that's what I was trying to say:) – Andronicus Apr 28 '20 at 05:33