I have the following DataFrame df
in Spark:
+------------+---------+-----------+
|OrderID | Type| Qty|
+------------+---------+-----------+
| 571936| 62800| 1|
| 571936| 62800| 1|
| 571936| 62802| 3|
| 661455| 72800| 1|
| 661455| 72801| 1|
I need to select the row that has a largest value of Qty
per each unique OrderID
or the last rows per OrderID
if all Qty
are the same (e.g. as for 661455
). The expected result:
+------------+---------+-----------+
|OrderID | Type| Qty|
+------------+---------+-----------+
| 571936| 62802| 3|
| 661455| 72801| 1|
Any ides how to get it?
This is what I tried:
val partitionWindow = Window.partitionBy(col("OrderID")).orderBy(col("Qty").asc)
val result = df.over(partitionWindow)