1

I have a dataframe:

data = {'first_column':  ['first_value', 'second_value', ...],
        'second_column': ['yes', 'no', ...],
        'third_column':  ['first_value', 'second_value', ...],
        'fourth_column': ['yes', 'no', ...],
        }

I'm trying to groupby 'first_column', when values in 'second_column' and 'fourth_column' == 'yes' and I get an error: "TypeError: unsupported operand type(s) for &: 'list' and 'list' "

I receive no errors when the condition is set only to one column:

*data.groupby([data['second_column']=='yes'])[['first_column', 'third_column']].mean()*

but when I try to add the "&" operator is when it fails:

*data.groupby(([data['second_column']=='yes']) & ([data['fourth_column']=='yes']))[['first_column', 'third_column']].mean()*

Is there a workaround here? Thanks!

invesTIPS
  • 33
  • 1
  • 4

1 Answers1

0

You are not looking for groupby but rather to filter and get the mean of the remaining records -

data[data['second_column']=='yes'][['first_column', 'third_column']].mean()
Tom Ron
  • 5,906
  • 3
  • 22
  • 38
  • Chained subset expressions can be expensive both in processing and memory. Always combine expressions with `loc` -> `data.loc[data['second_column'] == 'yes', ['first_column', 'third_column']].mean()` – Henry Ecker Oct 20 '21 at 18:04
  • Can you please add a reference? – Tom Ron Oct 20 '21 at 18:05
  • 2
    SeaBean outlines this very well in [their answer here](https://stackoverflow.com/a/65875826/15497888) particularly about performance. Additionally, almost all of the links in the accepted answer to [How to deal with SettingWithCopyWarning in Pandas](https://stackoverflow.com/a/20627316/15497888) incuding the official docmentation on [Indexing and selecting data](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html) outline the subset behaviour of copies. As well as the issues that stem from this practice especially related around reassignment. – Henry Ecker Oct 20 '21 at 18:09