Identify unique combinations of values in columns, sum another column, and count number of appearances in pandas

Question

I have a DataFrame like so:

import pandas as pd

d = {'param_1': [1.0, 2.0, 1.0, 1.0, 3.0, 3.0, 1.0, 2.0, 2.0,], 
     'param_2': [0.02, 0.08, 0.02, 0.08, 0.08, 0.02, 0.02, 0.08, 0.02],
     'param_3': [0.8, 1.6, 2.5, 1.6, 1.6, 2.5, 0.8, 1.6, 0.8],
     'param_4': [0, 1, 0, 0, 1, 0, 0, 1, 0],
     'output_value': [67.85, 32.56, 126.4, -56.32, -10.36, 3.67, -6.78, -96.8, 325.4]
    }

df = pd.DataFrame(d)
print(df)

      param_1  param_2  param_3  param_4  output_value
0      1.0     0.02      0.8        0          67.85
1      2.0     0.08      1.6        1          32.56
2      1.0     0.02      2.5        0         126.40
3      1.0     0.08      1.6        0         -56.32
4      3.0     0.08      1.6        1         -10.36
5      3.0     0.02      2.5        0           3.67
6      1.0     0.02      0.8        0          -6.78
7      2.0     0.08      1.6        1         -96.80
8      2.0     0.02      0.8        0         325.40

I would like to identify the combinations of params (in the above example, rows 1 and 7 are the same combination of params, which resulted in different output_values) , then sum the output_value from each row where that combination of parameters appears. I would also like to count the number of times that combination of parameters appears.

I came up with this:

for col in df.columns[0:4]:
    for each in df[col].unique():
        print(col, ":", each, "value sum:", df[df[col] == each]['output_value'].sum(), "... number of unique values:", pd.Series(df[df[col] == each]['output_value'].unique()).count())

It’s useful, but wrong: the sums include duplicate values, and the params are all individualized.

Output should be something like this, for each combination of parameters that appears, or param_set:

param_set: {param_1: 2.0, param_2: 0.08, param_3: 1.6, param_4: 1} 
output_value_sum = -64.24
n_rows_for_this_set = 2

param_set: {param_1: 1.0, param_2: 0.02, param_3: 0.8, param_4: 0} 
output_value_sum = -61.07
n_rows_for_this_set = 2

This similar question: unique combinations of values in selected columns in pandas data frame and count seems to answer the count question, and this: df.groupby(list(df.columns[0:4])).size() returns part of what I need.

score 0 · Answer 1 · answered Jul 10 '18 at 10:01

Try using df.groupby(['param_1', 'param_2', 'param_3', 'param_4']).agg(['sum', 'count']), which returns:

                                 output_value      
                                         sum count
param_1 param_2 param_3 param_4                   
1.0     0.02    0.8     0              61.07     2
                2.5     0             126.40     1
        0.08    1.6     0             -56.32     1
2.0     0.02    0.8     0             325.40     1
        0.08    1.6     1             -64.24     2
3.0     0.02    2.5     0               3.67     1
        0.08    1.6     1             -10.36     1

You could unpack this DataFrame to print it out the way you'd prefer to see it, or just view it this way!

Did you answer you're own question or are you satisfied with your solution? — Scott Boston, Jul 10 '18 at 12:33
I did answer my own question @ScottBoston, not sure if I did this in the correct manner, although StackOverflow seemed to invite me to with their "Answer your own question!" check box :) — tw0000, Jul 10 '18 at 21:24

Identify unique combinations of values in columns, sum another column, and count number of appearances in pandas

1 Answers1

Linked