0

I want to do a conditional aggregation with pandas but with two conditionals, I have seen this Python Pandas Conditional Sum with Groupby and I have found really useful but if I added another condition for example:

g.apply(lambda x: x[x[x['key2'] == 'one']['data2']<0.4]['data1'].sum()) i.e add a condition that I want to sum the ones which Key2 is equal to one and that data2 is less than 0.4. But this doesnt work.

This is the error that I got: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

1 Answers1

1

if number of rows of x is N1, number of rows of x[x['key2'] == 'one'] will be N2 <= N1 and also number of rows of x[x['key2'] == 'one']['data2']<0.4 will be N2 too. Now, in the final x[...] stage, x has N1 rows and the mask inside [...] (which is x[x['key2'] == 'one']['data2']<0.4) has N2 rows. And you cannot slice a dataframe with a boolean mask of different length. So you can use @pmarcol's suggestion:

g.apply(lambda x: x[(x['key2'] == 'one') & (x['data2'] < 0.4)]['data1'].sum())
STS
  • 26
  • 2