I am using a simple groupby query in scala spark where the objective is to get the first value in the group in a sorted dataframe. Here is my spark dataframe
+---------------+------------------------------------------+
|ID |some_flag |some_type | Timestamp |
+---------------+------------------------------------------+
| 656565654| true| Type 1|2018-08-10 00:00:00|
| 656565654| false| Type 1|2017-08-02 00:00:00|
| 656565654| false| Type 2|2016-07-30 00:00:00|
| 656565654| false| Type 2|2016-05-04 00:00:00|
| 656565654| false| Type 2|2016-04-29 00:00:00|
| 656565654| false| Type 2|2015-10-29 00:00:00|
| 656565654| false| Type 2|2015-04-29 00:00:00|
+---------------+----------+-----------+-------------------+
Here is my aggregate query
val sampleDF = df.sort($"Timestamp".desc).groupBy("ID").agg(first("Timestamp"), first("some_flag"), first("some_type"))
The expected result is
+---------------+-------------+---------+-------------------+
|ID |some_falg |some_type| Timestamp |
+---------------+-------------+---------+-------------------+
| 656565654| true| Type 1|2018-08-10 00:00:00|
+---------------+-------------+---------+-------------------+
But getting following wierd output and it keeps changing like a random row
+---------------+-------------+---------+-------------------+
|ID |some_falg |some_type| Timestamp |
+---------------+-------------+---------+-------------------+
| 656565654| false| Type 2|2015-10-29 00:00:00|
+---------------+-------------+---------+-------------------+
Also please note that there are no nulls in the dataframe. I am scratching me head where I am doing something wrong. Need help!