Groupby when values on different columns are True

Question

I have a dataframe:

data = {'first_column':  ['first_value', 'second_value', ...],
        'second_column': ['yes', 'no', ...],
        'third_column':  ['first_value', 'second_value', ...],
        'fourth_column': ['yes', 'no', ...],
        }

I'm trying to groupby 'first_column', when values in 'second_column' and 'fourth_column' == 'yes' and I get an error: "TypeError: unsupported operand type(s) for &: 'list' and 'list' "

I receive no errors when the condition is set only to one column:

*data.groupby([data['second_column']=='yes'])[['first_column', 'third_column']].mean()*

but when I try to add the "&" operator is when it fails:

*data.groupby(([data['second_column']=='yes']) & ([data['fourth_column']=='yes']))[['first_column', 'third_column']].mean()*

Is there a workaround here? Thanks!

can you provide an example dataset (input and expected output)? — mozway, Oct 20 '21 at 17:53

score 0 · Answer 1 · answered Oct 20 '21 at 17:45

0

You are not looking for groupby but rather to filter and get the mean of the remaining records -

data[data['second_column']=='yes'][['first_column', 'third_column']].mean()

answered Oct 20 '21 at 17:45

Tom Ron

5,906
3
22
38

Chained subset expressions can be expensive both in processing and memory. Always combine expressions with `loc` -> `data.loc[data['second_column'] == 'yes', ['first_column', 'third_column']].mean()` – Henry Ecker Oct 20 '21 at 18:04
Can you please add a reference? – Tom Ron Oct 20 '21 at 18:05
2

SeaBean outlines this very well in [their answer here](https://stackoverflow.com/a/65875826/15497888) particularly about performance. Additionally, almost all of the links in the accepted answer to [How to deal with SettingWithCopyWarning in Pandas](https://stackoverflow.com/a/20627316/15497888) incuding the official docmentation on [Indexing and selecting data](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html) outline the subset behaviour of copies. As well as the issues that stem from this practice especially related around reassignment. – Henry Ecker Oct 20 '21 at 18:09

Groupby when values on different columns are True

1 Answers1