Let's say I have the following spark dataframe (df):
As it can be seen, there are duplicate values in the "Timestamp" column, and I want to get rid of them leaving rows where 'Timestamp' has unique values.
I tried to remove the duplicates with this line of code:
df.dropDuplicates(['Timestamp'])
It seems dropDuplicates()
retains the first row in the duplicated lines, but I need to have the last row in the duplicate (the ones highlighted in the table). How can this be done?