0

I have a df where there are duplicate rows in aggregate but in this form:

timestamp   animal_1  animal_2  
2020-06-28  14:28:57  dog fox    
2020-06-28  14:28:57  fox dog   
2020-06-29  18:28:57  dog fox   
2020-06-29  18:28:57  fox dog   
2020-06-30  17:35:57  dog fox   
2020-06-30  17:35:57  fox dog  

I only want to keep the rows that have a unique timestamp followed by a single combination of both animals. From the above df I would only want to return the following:

timestamp   animal_1  animal_2   
2020-06-28  14:28:57  dog fox    
2020-06-29  18:28:57  fox dog  
2020-06-30  17:35:57  dog fox  

What matters is that I return the number of times these 2 animals have interacted.

I have tried multiple sorting, grouping options using pandas but have had no luck.

ardito.bryan
  • 429
  • 9
  • 22
Mike
  • 3
  • 1
  • 1
    Does this answer your question? [Drop all duplicate rows in Python Pandas](https://stackoverflow.com/questions/23667369/drop-all-duplicate-rows-in-python-pandas) – sushanth Jul 12 '20 at 15:57

1 Answers1

1

First we need sort the column animals , the drop_duplicates

df[['animal_1', 'animal_2']]=np.sort(df[['animal_1', 'animal_2']].values, axis=1)
df=df.drop_duplicates()
BENY
  • 317,841
  • 20
  • 164
  • 234