I use Spark 2.1.
If I run the following example:
val seq = Seq((123,"2016-01-01","1"),(123,"2016-01-02","2"),(123,"2016-01-03","3"))
val df = seq.toDF("id","date","score")
val dfAgg = df.sort("id","date").groupBy("id").agg(last("score"))
dfAgg.show
dfAgg.show
dfAgg.show
dfAgg.show
dfAgg.show
The output of above code is:
+---+------------------+
| id|last(score, false)|
+---+------------------+
|123| 1|
+---+------------------+
+---+------------------+
| id|last(score, false)|
+---+------------------+
|123| 2|
+---+------------------+
+---+------------------+
| id|last(score, false)|
+---+------------------+
|123| 1|
+---+------------------+
+---+------------------+
| id|last(score, false)|
+---+------------------+
|123| 3|
+---+------------------+
+---+------------------+
| id|last(score, false)|
+---+------------------+
|123| 3|
+---+------------------+
The intention was to get the score associated with the latest date for each id:
+---+------------------+
| id|last(score, false)|
+---+------------------+
|123| 3|
+---+------------------+
but this clearly hasn't worked as the result is non-deterministic. Do we have to use window functions to achieve this?