The truth value of a Series is ambiguous. Can't figure it out

Question

I'm new to python... So, i wrote this function that should normalize the price values contained in the "price" column of my dataframe:

def normalize_price(df): 
    for elements in df['price']: 
        if (df["price"]>= 1000) and (df['price']<= 1499): 
            df['price'] = 1000 
            return
        elif 1500 <= df['price'] <= 2499:
            df['price'] = 1500 
            return
        elif 2500 <= df['price'] <= 2999:
            df['price'] = 2500 
            return
        elif 3000 <= df['price'] <= 3999:
            df['price'] = 3000 
            return

So, when I call it I get the error

---------------------------------------------------------------------------
<ipython-input-86-1e239d3cbba4> in normalize_price(df)
     20 def normalize_price(df):
     21     for elements in df['price']:
---> 22         if (df["price"]>= 1000) and (df['price']<= 1499):
     23             df['price'] = 1000
     24             return

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

And since I'm going crazy, I'd like to know why :) Thanks!

Does this answer your question? [Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()](https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o) — Bruno Mello, Apr 08 '20 at 19:18
You aren't using `elements`, try `if elements >= 1000`, rather than the whole column — C.Nivs, Apr 08 '20 at 19:19
you are comparing a whole Series to a integer. `df["price"]` is a the complete column from your DataFrame, so when you use `if (df["price"]>= 1000)` the result is ambiguous. — benja d, Apr 08 '20 at 19:23
Right. Now i notice it... So, i should write df[`price`] [elements ]? — mandiatutti, Apr 08 '20 at 19:26

It_is_Chris · Accepted Answer · 2020-04-08T20:13:27.280

2

np.select is probably the easiest approach

def normalize_price(df): 
    # create a list of conditions
    cond = [
        (df["price"]>= 1000) & (df['price']<= 1499),
        1500 <= df['price'] <= 2499,
        2500 <= df['price'] <= 2999,
        3000 <= df['price'] <= 3999
    ]
    # create a list of choices based on the conditions above
    choice = [
        1000,
        1500,
        2500,
        3000
    ]
    # use numpy.select and assign array to df['price']
    df['price'] = np.select(cond, choice, df['price'])
    return df

update with example

np.random.seed(1)
df = pd.DataFrame(np.random.randint(0,10000, 50), columns=['price'])

def normalize_price(df): 
    cond = [
        (df["price"]>= 1000) & (df['price']<= 1499),
        (df['price'] >= 1500) & (df['price'] <= 2499),
        (df['price'] >= 2500) & (df['price'] <= 2999),
        (df['price'] >= 3000) & (df['price'] <= 3999)
    ]

    choice = [
        1000,
        1500,
        2500,
        3000
    ]

    df['price_new'] = np.select(cond, choice, df['price'])
    return df

normalize_price(df)

    price  price_new
0     235        235
1    5192       5192
2     905        905
3    7813       7813
4    2895       2500 <-----
5    5056       5056
6     144        144
7    4225       4225
8    7751       7751
9    3462       3000 <----

edited Apr 08 '20 at 20:13

answered Apr 08 '20 at 19:25

It_is_Chris

13,504
2
23
41

but this way in the cond list I'm still looking into df["price"] which is a list... – mandiatutti Apr 08 '20 at 19:50
`df['price']` is not a list it is a `pd.Series` – It_is_Chris Apr 08 '20 at 19:51
then rewrite your conditions: `(df['price'] >= 1500) & (df['price'] <= 2499)` – It_is_Chris Apr 08 '20 at 20:01
I know this should be really basic, but that's what I'm trying to do in a while rightn now... ahahaha – mandiatutti Apr 08 '20 at 20:07
And that works perfectly! but I have one more question. so, apparently if I write 1500 <= df['price'] <= 2499, it doesn't work, while if I use the other form it does... Why? :) – mandiatutti Apr 08 '20 at 20:30
1

check out this [answer](https://stackoverflow.com/a/36922103/9177877). It provides a good explanation – It_is_Chris Apr 08 '20 at 20:39

rpanai · Answer 2 · 2020-04-08T20:03:14.650

2

Here you should really avoid for loops and if statements. You just want to round to the nearest 500 mark so you could do

import pandas as pd
import numpy as np

df = pd.DataFrame({"price":[1200, 1600, 2100, 3499]})

df["price"] = (df["price"]/500).apply(np.floor)*500

EDIT if you are looking for a more general solution


df = pd.DataFrame({"price":[1200, 1600, 2100, 3499,3600, 140000, 160000]})

df["div"] = 5*10**(df["price"].astype(str).str.len()-2)
(df["price"]/df["div"]).apply(np.floor)*df["div"]

edited Apr 08 '20 at 20:03

answered Apr 08 '20 at 19:29

rpanai

12,515
2
42
64

1

that's not quite what I want to do since I have predefined intervals, for bigger values than just what I posted... For example one of my interval is 150000-200000 and for that one it just wont work – mandiatutti Apr 08 '20 at 19:37
But thanks for your input, I just learnt something useful! :) – mandiatutti Apr 08 '20 at 19:38
what do you mean? – mandiatutti Apr 08 '20 at 19:40
uuuh nice! for this purpouse I shall stick with the simple idea I had, but that solution is way cleaner! thanks! – mandiatutti Apr 08 '20 at 19:47
1

I added an example so it will work for the general case. – rpanai Apr 08 '20 at 20:03

score 0 · Answer 3 · answered Apr 08 '20 at 19:44

You can use pandas.cut for that purpose, in your case:

bins=[1000, 1500, 2500, 3000, 4000]

df["bin"]=pd.cut(df["price"], bins, right=False, retbins=False, labels=bins[:-1])

Assuming bin column is the output column for your function

Ref: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html

The truth value of a Series is ambiguous. Can't figure it out

3 Answers3

update with example