0

Some how in Spark2.0, I can use Dataframe.map(r => r.getAs[String]("field")) without problems

But DataSet.map(r => r.getAs[String]("field")) gives error that r doesn't have the "getAs" method.

What's the difference between r in DataSet and r in DataFrame and why r.getAs only works with DataFrame?

After doing some research in StackOverflow, I found a helpful answer here

Encoder error while trying to map dataframe row to updated row

Hope it's helpful

cozyss
  • 1,290
  • 1
  • 15
  • 22
  • 2
    what is they type of your dataset? `getAs` is a method on `Row` so you could only use `r.getAs` is `r` is a `Row` (ie, your dataset is a `DataSet[Row]`, notice that `DataFrame` is just an alias for `DataSet[Row]`) – puhlen Sep 06 '17 at 18:39
  • Thanks. I used Dataset[_]. When I do DataSet.map( r => xx), what is r? Is it a row of data? @puhlen – cozyss Sep 06 '17 at 18:43
  • No, `r` is `Any` because you didn't specify it's type – puhlen Sep 06 '17 at 18:44

1 Answers1

4

Dataset has a type parameter: class Dataset[T]. T is the type of each record in the Dataset. That T might be anything (well, anything for which you can provide an implicit Encoder[T], but that's besides the point).

A map operation on a Dataset applies the provided function to each record, so the r in the map operations you showed will have the type T.

Lastly, DataFrame is actually just an alias for Dataset[Row], which means each record has the type Row. And Row has a method named getAs that takes a type parameter and a String argument, hence you can call getAs[String]("field") on any Row. For any T that doesn't have this method - this will fail to compile.

Tzach Zohar
  • 37,442
  • 3
  • 79
  • 85