I have a pyspark dataframe with 1.6million records. I sorted it and then group by hoping the sorting order will be preserved so that I can select the last value of the sorted column in the group by. However, it seems like the sorting order is not necessarily preserved during the group. Should I use pyspark Window instead of a sort and group?
output_data = input_data.sort(F.col("id"))\
.sort(F.col("date").asc())\
.groupBy("id").agg(F.last("date").alias("date"))