I have a df
, and want to remove all duplicates on ID
.
Name Symbol ID
0 ZOO INC Remove 88579Y101
1 Zoo Inc ZZZ 88579Y101
2 A Inc AAA 90138A103
3 a inc. Remove 90138A103
4 2U Inc TWUO 90214J101
5 Keep Remove 111111111
But I only want to remove the duplicate rows where Symbol == 'Remove'
. The output should look like:
Name Symbol ID
0 Zoo Inc ZZZ 88579Y101
1 A Inc AAA 90138A103
2 2U Inc TWUO 90214J101
3 Keep Remove 111111111
I can't use result_df = df.drop_duplicates(subset=['ID'], keep='first')
(or keep='last'
) because the dataset doesn't have a specific pattern. And sorting alphabetically first won't help either.
And while I know I can replace all Remove
with NaN
, and then use the solution provided here, I am looking for an alternate solution because I may eventually need to pass a list of strings.
Does Pandas support anything like: result_df = df.drop_duplicates(subset=['ID'], keep=(df['Symbol'] != 'Remove'))
?