I have a dataframe in pandas in python with criminal charges and Person IDs. Some charges are listed twice for the same ID because an individual had 2 or more counts of the same charge. Some are listed twice because it was tried once in a district court, then moved to a superior court. Grouping by individual, I'd like to identify these charges, then drop the version from the district court.
Here is a sample dataset:
d = {'ID': [1, 1, 1, 1, 2, 2, 2, 3, 3],
'Court': ['district', 'district', 'district', 'superior', 'district', 'district', 'superior', 'district', 'superior'],
'Charge': ['larceny', 'larceny', 'assault', 'assault', 'oui', 'larceny', 'assault', 'assault', 'assault']}
df = pd.DataFrame(data=d)
ID Court Charge
0 1 district larceny
1 1 district larceny
2 1 district assault
3 1 superior assault
4 2 district oui
5 2 district larceny
6 2 superior assault
7 3 district assault
8 3 superior assault
I can identify duplicates by ID and Charge. But it needs to be a duplicate by ID and Charge but have a different value for Court.
This is the result I would like:
ID Court Charge
0 1 district larceny
1 1 district larceny
2 1 superior assault
3 2 district oui
4 2 district larceny
5 2 superior assault
6 3 superior assault
I can identify 'duplicate' charges using duplicated(), but am not sure how to make sure the duplicates have two different values for Court before dropping. Thanks for any help!