I would like to filter my dataframe to only keep rows with max value in column some_date
.
df.filter(F.col('some_date') = F.max('some_date'))
fails, as max is not used in aggregate.
I also tried to just get the max_date value to then use it in filter: max_date = df.groupBy().max('some_date')
, which failed telling me that "some_date" is not a numeric column. Aggregation function can only be applied on a numeric column.
In SQL, I would achieve this with a subquery (to the effect of where some_date = (select max(some_date) from ...
), but I thought there would be a better way to structure it in Python.