I have a dataframe:
import pandas as pd
data = pd.DataFrame({"col1": ["a", "a", "a", "a", "a", "a"],
"col2": [0,0,0,1,1, 1],
"col3": [1,2,3,4,5, 6]})
data
col1 col2 col3
0 a 0 1
1 a 0 2
2 a 0 3
3 a 1 4
4 a 1 5
5 a 1 6
I'm trying to remove the duplicates based on col2 == 1
and keep the last entry
Using the below code I was able to keep the first and drop others.
data[~(data.duplicated(["col2"]) & data.col2.eq(1))]
col1 col2 col3
0 a 0 1
1 a 0 2
2 a 0 3
3 a 1 4
How to remove duplicates based on one category in a column and keep the last entry?
Desired Output
col1 col2 col3
0 a 0 1
1 a 0 2
2 a 0 3
3 a 1 6