I have a dataset :
id url keep_if_dup
1 A.com Yes
2 A.com Yes
3 B.com No
4 B.com No
5 C.com No
I want to remove duplicates, i.e. keep first occurence of "url" field, BUT keep duplicates if the field "keep_if_dup" is YES.
Expected output :
id url keep_if_dup
1 A.com Yes
2 A.com Yes
3 B.com No
5 C.com No
What I tried :
Dataframe=Dataframe.drop_duplicates(subset='url', keep='first')
which of course does not take into account "keep_if_dup" field. Output is :
id url keep_if_dup
1 A.com Yes
3 B.com No
5 C.com No