I have a DataFrame like this:
df = pd.DataFrame([
{'id': '001', 'score': 0.1, 'mat': np.array([2, 2, 0, 2, 0, 0, 0, 0, 0])},
{'id': '001', 'score': 0.3, 'mat': np.array([0, 3, 0, 0, 3, 0, 0, 0, 0])},
{'id': '001', 'score': 0.5, 'mat': np.array([0, 1, 0, 0, 0, 0, 0, 0, 0])},
{'id': '002', 'score': 0.1, 'mat': np.array([2, 0, 0, 0, 0, 0, 2, 0, 0])},
{'id': '002', 'score': 0.2, 'mat': np.array([0, 0, 4, 0, 0, 0, 0, 4, 0])},
])
What I want is to combine the mat
with same id
with following rules:
- If only one non-zero at the same location, then choose that value as new
mat
; - If there're multiple of them, chose the one with highest score.
Here's the result I expected:
id mat
001 [2, 1, 0, 2, 3, 0, 0, 0, 0]
002 [2, 0, 4, 0, 0, 0, 2, 4, 0]
What I've tried:
df.drop(columns=['score']).groupby(['id']).sum()
This will ignore the score
then add all mat
under same id, it's correct if only one non-zero at the same location.
df.drop(columns=['score']).groupby(['id']).agg('sum')
This will yield the same wrong result ([2, 9, 0, 2, 3, 0, 0, 0, 0]
for id 001). However, I wonder if there's anyway to write a custom function for .agg()
to achieve my goal?
Thanks for your help!