Pandas dataframe - duplicates in data but dups don't reside in same columns

Question

I have a df where there are duplicate rows in aggregate but in this form:

timestamp   animal_1  animal_2  
2020-06-28  14:28:57  dog fox    
2020-06-28  14:28:57  fox dog   
2020-06-29  18:28:57  dog fox   
2020-06-29  18:28:57  fox dog   
2020-06-30  17:35:57  dog fox   
2020-06-30  17:35:57  fox dog

I only want to keep the rows that have a unique timestamp followed by a single combination of both animals. From the above df I would only want to return the following:

timestamp   animal_1  animal_2   
2020-06-28  14:28:57  dog fox    
2020-06-29  18:28:57  fox dog  
2020-06-30  17:35:57  dog fox

What matters is that I return the number of times these 2 animals have interacted.

I have tried multiple sorting, grouping options using pandas but have had no luck.

Does this answer your question? [Drop all duplicate rows in Python Pandas](https://stackoverflow.com/questions/23667369/drop-all-duplicate-rows-in-python-pandas) — sushanth, Jul 12 '20 at 15:57

score 1 · Accepted Answer · answered Jul 12 '20 at 15:57

1

First we need sort the column animals , the drop_duplicates

df[['animal_1', 'animal_2']]=np.sort(df[['animal_1', 'animal_2']].values, axis=1)
df=df.drop_duplicates()

answered Jul 12 '20 at 15:57

BENY

317,841
20
164
234

thanks for the code, I tested it and it works perfectly. Appreciate the help. – Mike Jul 13 '20 at 01:11

Pandas dataframe - duplicates in data but dups don't reside in same columns

1 Answers1