I have the following dataframe but I cannot work out how to extract all the columns the first row of a group.
+--------------------+------------+--------+
| timestamp|nanos |file_idx|
+--------------------+------------+--------+
|2018-09-07 05:00:...| 64044267| 1 |
|2018-09-07 05:00:...| 64044267| 2 |
|2018-09-07 05:00:...| 58789223| 3 |
+--------------------+------------+--------+
How can do I extract the row with the biggest file_idx for the same timestamp and nanosecond? I've tried using a groupBy function but it only returns those columns in my group by clause, where in reality this table contains 160 columns.
The desired outcome in the above example would be
+--------------------+------------+--------+
| timestamp|nanos |file_idx|
+--------------------+------------+--------+
|2018-09-07 05:00:...| 64044267| 2 |
|2018-09-07 05:00:...| 58789223| 3 |
+--------------------+------------+--------+