i dont understand why my if cycle does not work correctly

Question

I wrote next cycle, but when I run it, Spyder shows me next message:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

why it happens? because I specified the value to be assigned to the column if none of the conditions is met.

if 11 >= df['age'] <= 20:
    df['age_enc'] = 20
elif 21 >= df['age'] <= 25:
    df['age_enc'] = 25
elif 26 >= df['age'] <= 30:
    df['age_enc'] = 30
elif 31 >= df['age'] <= 35:
    df['age_enc'] = 35       
elif 36 >= df['age'] <= 40:
    df['age_enc'] = 40    
elif 41 >= df['age'] <= 50:
    df['age_enc'] = 50  
elif 51 >= df['age'] <= 60:
    df['age_enc'] = 60  
else:
    df['age_enc'] = 100;

`df['age']` is not a single value but a whole column. For such cases where you need to compare more than one values in an if statement, you will have to use `any` or `all` — Sheldore, Dec 15 '18 at 12:40
Check the content of df. df['age'] is not a scalar value, but a matrix or vector maybe — Arashsyh, Dec 15 '18 at 12:45
Also, you mean `<=` in the first part of all your conditions, not `>=` — Thierry Lathuille, Dec 15 '18 at 12:56
You want [`pandas.cut`](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.cut.html) — roganjosh, Dec 15 '18 at 13:40

NaWeeD · Answer 1 · 2018-12-15T14:16:24.377

1

Because df['age'] specifies a Series type in pandas and its not a single value you can't simply write df['age'] <= 20 and its gonna be either any of the values or all of them so you can simply use df['age'].all() <= 20.

to solve your problem you can use pandas filtering as follows:

df_part = df[(df['age'] <= 20) & (df['age' >= 11)]
df_part['age_enc'] = 20

then you can merge those dataframe parts together

edited Dec 15 '18 at 14:16

answered Dec 15 '18 at 13:15

NaWeeD

561
5
15

Well, no, not really, because this still leaves a horrible set of `if`/ `elif` which has to iterate the entire dataset each time. Use [cut](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.cut.html). I'm on a phone so I can't illustrate, but you potentially can. – roganjosh Dec 15 '18 at 13:43

score 0 · Answer 2 · answered Dec 15 '18 at 14:09

Two things are going wrong here. First of all, you have to split your comparisons. Second, this kind of comparison will yield a boolean ndarray, which you cant use directly as a condition, becasue it is ambiguous. You can do this:

if (11 <= df['age']).all() and (df['age'] <= 20).all():
    ...

Thats pretty verbose, but this wont work:

if (11 <= df['age'] <= 20).all():
    ...

Note I changed >= by <= and that you can use any or all whichever suits to your case. Let me know if this worked for you.

score 0 · Answer 3 · answered Dec 15 '18 at 14:10

i think i find the simpliest solution:

# encoding
from sklearn import preprocessing
le1 = preprocessing.LabelEncoder()
le1.fit(df['age'])
df['age_enc'] = le1.transform(df['age'])
#
keys = le1.classes_
values = le1.transform(le1.classes_)
dictionary = dict(zip(keys, values))
print(dictionary)

because i wanna transcode my variables and create a new column - this way may be the simpiest

i dont understand why my if cycle does not work correctly

3 Answers3