I have a DataFrame like so:
import pandas as pd
d = {'param_1': [1.0, 2.0, 1.0, 1.0, 3.0, 3.0, 1.0, 2.0, 2.0,],
'param_2': [0.02, 0.08, 0.02, 0.08, 0.08, 0.02, 0.02, 0.08, 0.02],
'param_3': [0.8, 1.6, 2.5, 1.6, 1.6, 2.5, 0.8, 1.6, 0.8],
'param_4': [0, 1, 0, 0, 1, 0, 0, 1, 0],
'output_value': [67.85, 32.56, 126.4, -56.32, -10.36, 3.67, -6.78, -96.8, 325.4]
}
df = pd.DataFrame(d)
print(df)
param_1 param_2 param_3 param_4 output_value
0 1.0 0.02 0.8 0 67.85
1 2.0 0.08 1.6 1 32.56
2 1.0 0.02 2.5 0 126.40
3 1.0 0.08 1.6 0 -56.32
4 3.0 0.08 1.6 1 -10.36
5 3.0 0.02 2.5 0 3.67
6 1.0 0.02 0.8 0 -6.78
7 2.0 0.08 1.6 1 -96.80
8 2.0 0.02 0.8 0 325.40
I would like to identify the combinations of params (in the above example, rows 1 and 7 are the same combination of params, which resulted in different output_values) , then sum the output_value
from each row where that combination of parameters appears. I would also like to count the number of times that combination of parameters appears.
I came up with this:
for col in df.columns[0:4]:
for each in df[col].unique():
print(col, ":", each, "value sum:", df[df[col] == each]['output_value'].sum(), "... number of unique values:", pd.Series(df[df[col] == each]['output_value'].unique()).count())
It’s useful, but wrong: the sums include duplicate values, and the params are all individualized.
Output should be something like this, for each combination of parameters that appears, or param_set
:
param_set: {param_1: 2.0, param_2: 0.08, param_3: 1.6, param_4: 1}
output_value_sum = -64.24
n_rows_for_this_set = 2
param_set: {param_1: 1.0, param_2: 0.02, param_3: 0.8, param_4: 0}
output_value_sum = -61.07
n_rows_for_this_set = 2
This similar question: unique combinations of values in selected columns in pandas data frame and count seems to answer the count question, and this: df.groupby(list(df.columns[0:4])).size()
returns part of what I need.