lambda function with if...else in agg on dataframe

Question

I am trying to calculate aggregate using a lambda function with if... else.

My dataframe looks like this

    States  Sales
0   Delhi   0.0
1   Kerala  2.5
2   Punjab  5.0
3   Haryana 7.5
4   Delhi   10.0
5   Kerala  12.5
6   Punjab  15.0
7   Haryana 17.5

Expected summary table should like this

States  Sales
Delhi   10.0
Haryana 25.0
Kerala  15.0
Punjab  20.0

I tried using the following code

df.groupby('States').agg({
                    'Sales':lambda x: np.sum(x) if (x>7) else 0
                    })

I get ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What am I doing wrong?

Does this answer your question? [Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()](https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o) — Iguananaut, Sep 27 '21 at 12:30
How did it not solve the issue? It is exactly that issue, as the below answers also demonstrate. You need to use `(x > 7).any()` for example. — Iguananaut, Sep 27 '21 at 14:55
True, my bad..... I could not apply the suggestions in the post as demonstrated by posts below. Thanks though — geoabram, Sep 27 '21 at 14:57

jezrael · Answer 1 · 2021-09-27T12:33:32.960

2

If compare by (x>7) ouput is Series with True and Falses, for test if match at least value use any:

df1 = df.groupby('States').agg({'Sales':lambda x: np.sum(x) if (x>7).any() else 0})
print (df1)
         Sales
States        
Delhi     10.0
Haryana   25.0
Kerala    15.0
Punjab    20.0

If need replace all values lower or equal like 7 to 0 and then aggregate sum use:

df2 = df['Sales'].where(df['Sales'].gt(7), 0).to_frame().groupby(df['States']).sum()
print (df2)
         Sales
States        
Delhi     10.0
Haryana   25.0
Kerala    12.5
Punjab    15.0

edited Sep 27 '21 at 12:33

answered Sep 27 '21 at 12:23

jezrael

822,522
95
1,334
1,252

Thanks @jezrael, for both options! – geoabram Sep 27 '21 at 14:53
@jezrael. Just curious, how should we approach this if we want to filter based on the value of a different column (say column C) and not based on aggregated value of the column C? – tjt Mar 09 '23 at 19:18
1

@tjt - if change `df['Sales'].where(df['Sales'].gt(7), 0)` to `df['Sales'].where(df['Another col'].gt(7), 0)` it working? – jezrael Mar 10 '23 at 06:22
@jezrael Thanks. I think that will do. Is there a way to approach the following slight different question - (using the group by on states) if 'another col' gt (7) do sum, if lt(7) do avg or even cumsum. Sorry, I know this thread is for the main question posted and I am diverting. – tjt Mar 10 '23 at 15:21

score 1 · Answer 2 · answered Sep 27 '21 at 12:30

1

 df.groupby('States')['Sales'].agg(lambda x: sum(x) if x.ge(7).any() else 0).to_frame('Sales')

         Sales
States       
Delhi    10.0
Haryana  25.0
Kerala   15.0
Punjab   20.0

answered Sep 27 '21 at 12:30

wwnde

26,119
6
18
32

Thanks @wwnde, that worked! – geoabram Sep 27 '21 at 14:53

lambda function with if...else in agg on dataframe

2 Answers2