Group Rows in a DataFrame

Question

I have a DataFrame with columns similar to:

I want to use 'pd.groupby' to group rows according to ID column. Additionally, I want to use '.agg()' for applying functions to each column.

For the columns with scores, I want to apply 'np.average'. For example, for the column 'Reliability Score Flow A', the weights are in 'Flow A' column, for the column 'Reliability Score Flow B' in 'Flow B' and so on. On the other hand, for the columns with Flows, I only want to use sum.

So, the expected output would be something like:

enter image description here

How do you do it?

Thank you,

Have you read the [docs](http://pandas.pydata.org/pandas-docs/stable/groupby.html) ? — jeschwar, Jan 16 '19 at 20:07
Can you include a sample of your dataframes as well as your expected output? — rahlf23, Jan 16 '19 at 20:28
@jeschwar, yes, I have. However, it's my first time using Pandas, I am learning by doing. Thank you for the info. — jodhernandezbe, Jan 16 '19 at 21:51
Possible duplicate of [Python Pandas : group by in group by and average?](https://stackoverflow.com/questions/30328646/python-pandas-group-by-in-group-by-and-average) — BenP, Jan 16 '19 at 21:57

score 0 · Answer 1 · answered Jan 16 '19 at 22:36

Create a dictionary show how to aggregate each column.

dd = {k:'mean' for k in df.filter(regex='^Flow.*').columns.tolist()}
for i in df.filter(like='Relia'):
    dd[i] = 'sum'
dd

Output:

{'Flow A': 'mean',
 'Flow B': 'mean',
 'Flow C': 'mean',
 'Flow D': 'mean',
 'Flow E': 'mean',
 'Reliabilty Score Flow A': 'sum',
 'Reliabilty Score Flow B': 'sum',
 'Reliabilty Score Flow C': 'sum',
 'Reliabilty Score Flow D': 'sum',
 'Reliabilty Score Flow E': 'sum'}

Then use groupby with agg and dictionary, dd:

df.groupby('ID').agg(dd).reindex(df.columns[1:], axis=1)

Group Rows in a DataFrame

1 Answers1