This is my data frame:
Fruits Person Eat
Banana Peter Yes
Banana Ashley Yes
Strawberry Peter No
Strawberry Ashley Yes
Cherry Peter Yes
Orange Peter No
Orange Ashley No
Grape Ashley Yes
Pear Ashley Yes
Pear Peter Yes
There are duplicate fruits in my data frame. I need to delete the duplicates based on the following logic. If there is a duplicate fruit and Peter and Ashley both eat it, then Peter's row is kept and Ashley's row is deleted. If there is a duplicate fruit and Peter doesn't eat it and Ashley eats it, then Peter's row is deleted and Ashley's row remains. If there is a duplicate fruit and Peter doesn't eat it and Ashley doesn't eat it, then both rows are deleted.
With this logic the data frame should output like:
Fruits Person Eat
Banana Peter Yes
Strawberry Ashley Yes
Cherry Peter Yes
Grape Ashley Yes
Pear Peter Yes
I'm not sure how to iterate through a pandas data frame with these conditions to delete duplicates. Generally, for the first condition I would do something like this:
data = [
{
"fruit": "Apple",
"person": "Ashley",
"eats": True
},
{
"fruit": "Apple",
"person": "Peter",
"eats": True
}
]
eats = dict()
for i, row in enumerate(data):
fruit = row["fruit"]
person = row["person"]
does_eat = row["eats"]
# mark whether person eats fruit
if not eats.get(person):
eats[person] = dict()
# if person does eat, record row number for later deletion if needed if does_eat:
eats[person][fruit] = i
# dedup
if person == "Peter" and eats.get("Peter") and eats["Peter"].get(fruit):
data.pop(eats["Ashley"][fruit])
elif person == "Ashley" and eats.get("Peter") and eats["Peter"].get(fruit):
data.pop(i)
Any help/tips on how to do this with my data frame would be very appreciated.