val spark = SparkSession.builder().appName("Test").master("local").getOrCreate()
import spark.implicits._
spark.sparkContext.parallelize(List((50, 2), (34, 4))).toDF("cs_p_id", "cs_ed")
.groupByKey(_.getAs[String]("cs_p_id"))
.reduceGroups(Seq(_, _).maxBy(_.getAs[Long]("cs_ed")))
.map(_._2) // Unable to find encoder for type stored in a Dataset.
The above won't compile because the map
can't find an implicit Encoder[Row]
.
Surely I'm not the only guy trying to do this simple operation, so what's the way to go?
Thanks
EDIT:
I found this solution that I can't believe people are doing this way:
tableData
.groupByKey(_.getAs[String]("cs_p_id"))
.reduceGroups(Seq(_, _).maxBy(_.getAs[Long]("cs_ed")))
.map(_._2)(RowEncoder(tableData.schema))
This is not a duplicate of Encoder error while trying to map dataframe row to updated row since I'm trying to just remove duplicates.