0

I'm new to python... So, i wrote this function that should normalize the price values contained in the "price" column of my dataframe:

def normalize_price(df): 
    for elements in df['price']: 
        if (df["price"]>= 1000) and (df['price']<= 1499): 
            df['price'] = 1000 
            return
        elif 1500 <= df['price'] <= 2499:
            df['price'] = 1500 
            return
        elif 2500 <= df['price'] <= 2999:
            df['price'] = 2500 
            return
        elif 3000 <= df['price'] <= 3999:
            df['price'] = 3000 
            return

So, when I call it I get the error

---------------------------------------------------------------------------
<ipython-input-86-1e239d3cbba4> in normalize_price(df)
     20 def normalize_price(df):
     21     for elements in df['price']:
---> 22         if (df["price"]>= 1000) and (df['price']<= 1499):
     23             df['price'] = 1000
     24             return

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

And since I'm going crazy, I'd like to know why :) Thanks!

mandiatutti
  • 57
  • 1
  • 10
  • Does this answer your question? [Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()](https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o) – Bruno Mello Apr 08 '20 at 19:18
  • You aren't using `elements`, try `if elements >= 1000`, rather than the whole column – C.Nivs Apr 08 '20 at 19:19
  • you are comparing a whole Series to a integer. `df["price"]` is a the complete column from your DataFrame, so when you use `if (df["price"]>= 1000)` the result is ambiguous. – benja d Apr 08 '20 at 19:23
  • use `df.loc or np.select` – Umar.H Apr 08 '20 at 19:24
  • Right. Now i notice it... So, i should write df[`price`] [elements ]? – mandiatutti Apr 08 '20 at 19:26

3 Answers3

2

np.select is probably the easiest approach

def normalize_price(df): 
    # create a list of conditions
    cond = [
        (df["price"]>= 1000) & (df['price']<= 1499),
        1500 <= df['price'] <= 2499,
        2500 <= df['price'] <= 2999,
        3000 <= df['price'] <= 3999
    ]
    # create a list of choices based on the conditions above
    choice = [
        1000,
        1500,
        2500,
        3000
    ]
    # use numpy.select and assign array to df['price']
    df['price'] = np.select(cond, choice, df['price'])
    return df

update with example

np.random.seed(1)
df = pd.DataFrame(np.random.randint(0,10000, 50), columns=['price'])

def normalize_price(df): 
    cond = [
        (df["price"]>= 1000) & (df['price']<= 1499),
        (df['price'] >= 1500) & (df['price'] <= 2499),
        (df['price'] >= 2500) & (df['price'] <= 2999),
        (df['price'] >= 3000) & (df['price'] <= 3999)
    ]

    choice = [
        1000,
        1500,
        2500,
        3000
    ]

    df['price_new'] = np.select(cond, choice, df['price'])
    return df

normalize_price(df)

    price  price_new
0     235        235
1    5192       5192
2     905        905
3    7813       7813
4    2895       2500 <-----
5    5056       5056
6     144        144
7    4225       4225
8    7751       7751
9    3462       3000 <----
It_is_Chris
  • 13,504
  • 2
  • 23
  • 41
2

Here you should really avoid for loops and if statements. You just want to round to the nearest 500 mark so you could do

import pandas as pd
import numpy as np

df = pd.DataFrame({"price":[1200, 1600, 2100, 3499]})

df["price"] = (df["price"]/500).apply(np.floor)*500

EDIT if you are looking for a more general solution


df = pd.DataFrame({"price":[1200, 1600, 2100, 3499,3600, 140000, 160000]})

df["div"] = 5*10**(df["price"].astype(str).str.len()-2)
(df["price"]/df["div"]).apply(np.floor)*df["div"]
rpanai
  • 12,515
  • 2
  • 42
  • 64
0

You can use pandas.cut for that purpose, in your case:

bins=[1000, 1500, 2500, 3000, 4000]

df["bin"]=pd.cut(df["price"], bins, right=False, retbins=False, labels=bins[:-1])

Assuming bin column is the output column for your function

Ref: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html

Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34