I have a dataframe as follows:
x1 x2 x3 x4
X A B 4
Y C B 2
Z A C 1
X C B 5
I want to remove such rows that are duplicates based on the values of columns x2 and x3. The one with a bigger value on column x4 will be kept. So in this case the result would be the following:
x1 x2 x3 x4
X A B 4
Z A C 1
X C B 5
The second row was removed because the pairs (C, B) appeared twice and the fourth row was kept because 5 > 2 on column x4.
I have the following code that seems to remove the duplicates correctly but does not keep the column x1 in the end result:
result <- df%>%group_by(x2, x3)%>%summarise(x4=max(x4))
x2 x3 x4
A B 4
A C 1
C B 5
How can all columns x1 ... x4 be kept in the end result?