1

I have this dataframe

df = pd.DataFrame({
    'text': ['a', 'a', 'a', 'b', 'b'],
    'group': [1, 1, 1, 2, 2],
    'value': [1, 2, 3, 4, 5],
    'some_other_to_include': ['a', 'a', 'c', 'b', 'b'],
    'criticality': [3, 3, 5, 4, 4]
})

Where i want to group by the 'group' column, then take an average of the value column while selecting the row with the highest 'criticality' and keeping the other columns

Intended result:

text    group    value    some_other_to_include    criticality
a       1        2        c                        5
b       2        4.5      b                        4

But i can't figure out a way without building a new dataframe from scratch and using nlargest and avg. Is there a smarter way of doing this?

Loxx
  • 111
  • 9

1 Answers1

1

Use DataFrameGroupBy.idxmax for get indices by maximal criticality with DataFrame.loc for select all columns, last change column value by aggregate mean in DataFrame.assign:

g = df.groupby('group')

out = (df.loc[g['criticality'].idxmax()]
         .assign(value=lambda x: x['group'].map(g['value'].mean())))
print (out)
  text  group  value some_other_to_include  criticality
2    a      1    2.0                     c            5
3    b      2    4.5                     b            4

or in DataFrame.join:

g = df.groupby('group')

out = (df.loc[g['criticality'].idxmax()]
         .drop('value', axis=1).join(g['value'].mean(), on='group')[df.columns])

print (out)
  text  group  value some_other_to_include  criticality
2    a      1    2.0                     c            5
3    b      2    4.5                     b            4
        
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252