-1

I have a DataFrame with columns similar to:

enter image description here

I want to use 'pd.groupby' to group rows according to ID column. Additionally, I want to use '.agg()' for applying functions to each column.

For the columns with scores, I want to apply 'np.average'. For example, for the column 'Reliability Score Flow A', the weights are in 'Flow A' column, for the column 'Reliability Score Flow B' in 'Flow B' and so on. On the other hand, for the columns with Flows, I only want to use sum.

So, the expected output would be something like:

enter image description here

How do you do it?

Thank you,

1 Answers1

0

Create a dictionary show how to aggregate each column.

dd = {k:'mean' for k in df.filter(regex='^Flow.*').columns.tolist()}
for i in df.filter(like='Relia'):
    dd[i] = 'sum'
dd

Output:

{'Flow A': 'mean',
 'Flow B': 'mean',
 'Flow C': 'mean',
 'Flow D': 'mean',
 'Flow E': 'mean',
 'Reliabilty Score Flow A': 'sum',
 'Reliabilty Score Flow B': 'sum',
 'Reliabilty Score Flow C': 'sum',
 'Reliabilty Score Flow D': 'sum',
 'Reliabilty Score Flow E': 'sum'}

Then use groupby with agg and dictionary, dd:

df.groupby('ID').agg(dd).reindex(df.columns[1:], axis=1)
Scott Boston
  • 147,308
  • 15
  • 139
  • 187