I would like to find the last record for an ID for a typed DataSet. I found a solution based on Dataframe : "Find minimum for a timestamp through Spark groupBy dataframe" Find minimum for a timestamp through Spark groupBy dataframe
But how doing the same with typed dataset ?
Something like :
case class Person(id: Int, name: String, time: Timestamp, kind: String)
val ds:DataSet[Person] = Seq(
(1, "Bob", parseDate("03/08/02 00:00:00"), "P"),
(1, "Bob", parseDate("04/08/02 00:00:00"), "PI"),
(1, "Bob", parseDate("03/08/02 12:00:00"), "PE"))
.toDF("id", "name", "time", "kind").as[Person]
ds.groupByKey(_.id)
.agg(max(_.time), _)
// .agg(max(struct("time", columnsButTime: _*)) as "all") => Work with Datafrane
// .select("all.*")