I have a Dataframe where I want to groupby
a couple conditions and then sum a few fields. When I sum the values I want to flatten the values
With an input of:
UserID Site First Name Last Name May Hours June Hours July Hours
6385 A John Doe 8 9 10
6385 A John Doe 3 7 8
4756 A Mark Mean 7 6 5
4756 B Mark Mean 3 2 3
and grouping by UserID
and Site
, I want an output of:
UserID Site First Name Last Name May Hours June Hours July Hours
6385 A John Doe 11 16 18
4756 A Mark Mean 7 6 5
4756 B Mark Mean 3 2 3
It can be assumed that besides the UserID, Site, and the Monthly Hours, all other column values are the same.
My current idea was to do:
for group_name, group_df in df.groupby(by=['UserID','Site']):
if len(group_df) > 1: # We have rows that need to be summed
sums = group_df[['May Hours','June Hours','July Hours']].sum()
# Assign this sum to the first row in the grouped_df
# Remove all following rows in grouped_df (we already got their sum)
# Append this to a new dataframe
else:
# Append this to a new dataframe
While this solution may work, it is definitely not efficient but I am unsure of the best way to flatten the dataframe