4

This is my data frame:

Fruits         Person        Eat

Banana         Peter         Yes 
Banana         Ashley        Yes
Strawberry     Peter         No
Strawberry     Ashley        Yes 
Cherry         Peter         Yes
Orange         Peter         No
Orange         Ashley        No
Grape          Ashley        Yes
Pear           Ashley        Yes
Pear           Peter         Yes

There are duplicate fruits in my data frame. I need to delete the duplicates based on the following logic. If there is a duplicate fruit and Peter and Ashley both eat it, then Peter's row is kept and Ashley's row is deleted. If there is a duplicate fruit and Peter doesn't eat it and Ashley eats it, then Peter's row is deleted and Ashley's row remains. If there is a duplicate fruit and Peter doesn't eat it and Ashley doesn't eat it, then both rows are deleted.

With this logic the data frame should output like:

Fruits         Person        Eat

Banana         Peter         Yes 
Strawberry     Ashley        Yes 
Cherry         Peter         Yes
Grape          Ashley        Yes
Pear           Peter         Yes

I'm not sure how to iterate through a pandas data frame with these conditions to delete duplicates. Generally, for the first condition I would do something like this:

data = [
    {
        "fruit": "Apple",
        "person": "Ashley",
        "eats": True
    },
    {
        "fruit": "Apple",
        "person": "Peter",
        "eats": True
    }
]
eats = dict()

for i, row in enumerate(data):
    fruit = row["fruit"]
person = row["person"]
does_eat = row["eats"]
# mark whether person eats fruit
if not eats.get(person):
    eats[person] = dict()

# if person does eat, record row number for later deletion if needed if does_eat:
eats[person][fruit] = i

# dedup
if person == "Peter" and eats.get("Peter") and eats["Peter"].get(fruit):
    data.pop(eats["Ashley"][fruit])
elif person == "Ashley" and eats.get("Peter") and eats["Peter"].get(fruit):
    data.pop(i)

Any help/tips on how to do this with my data frame would be very appreciated.

dfahsjdahfsudaf
  • 461
  • 4
  • 11
  • Possible duplicate of [Removing duplicates from Pandas dataFrame with condition for retaining original](https://stackoverflow.com/questions/33042777/removing-duplicates-from-pandas-dataframe-with-condition-for-retaining-original) – Jarred Parr Sep 20 '19 at 23:51
  • what happens to the fruit that is neither duplicate nor eaten? – Quang Hoang Sep 21 '19 at 01:32
  • If it's not a duplicate, the row remains in the data frame. I need to write a function that just addresses the duplicates in the data frame. – dfahsjdahfsudaf Sep 21 '19 at 02:38

1 Answers1

2

Try this:

df1 = (df[df.Eat.eq('Yes')].sort_values('Person')
                           .drop_duplicates(subset='Fruits', keep='last'))

Out[14]:
       Fruits  Person  Eat
3  Strawberry  Ashley  Yes
7       Grape  Ashley  Yes
0      Banana   Peter  Yes
4      Cherry   Peter  Yes
9        Pear   Peter  Yes
Andy L.
  • 24,909
  • 4
  • 17
  • 29