1

I wrote next cycle, but when I run it, Spyder shows me next message:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

why it happens? because I specified the value to be assigned to the column if none of the conditions is met.

if 11 >= df['age'] <= 20:
    df['age_enc'] = 20
elif 21 >= df['age'] <= 25:
    df['age_enc'] = 25
elif 26 >= df['age'] <= 30:
    df['age_enc'] = 30
elif 31 >= df['age'] <= 35:
    df['age_enc'] = 35       
elif 36 >= df['age'] <= 40:
    df['age_enc'] = 40    
elif 41 >= df['age'] <= 50:
    df['age_enc'] = 50  
elif 51 >= df['age'] <= 60:
    df['age_enc'] = 60  
else:
    df['age_enc'] = 100;
Mr. T
  • 11,960
  • 10
  • 32
  • 54
Igor
  • 9
  • 1

3 Answers3

1

Because df['age'] specifies a Series type in pandas and its not a single value you can't simply write df['age'] <= 20 and its gonna be either any of the values or all of them so you can simply use df['age'].all() <= 20.

to solve your problem you can use pandas filtering as follows:

df_part = df[(df['age'] <= 20) & (df['age' >= 11)]
df_part['age_enc'] = 20

then you can merge those dataframe parts together

NaWeeD
  • 561
  • 5
  • 15
  • Well, no, not really, because this still leaves a horrible set of `if`/ `elif` which has to iterate the entire dataset each time. Use [cut](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.cut.html). I'm on a phone so I can't illustrate, but you potentially can. – roganjosh Dec 15 '18 at 13:43
0

Two things are going wrong here. First of all, you have to split your comparisons. Second, this kind of comparison will yield a boolean ndarray, which you cant use directly as a condition, becasue it is ambiguous. You can do this:

if (11 <= df['age']).all() and (df['age'] <= 20).all():
    ...

Thats pretty verbose, but this wont work:

if (11 <= df['age'] <= 20).all():
    ...

Note I changed >= by <= and that you can use any or all whichever suits to your case. Let me know if this worked for you.

JoshuaCS
  • 2,524
  • 1
  • 13
  • 16
0

i think i find the simpliest solution:

# encoding
from sklearn import preprocessing
le1 = preprocessing.LabelEncoder()
le1.fit(df['age'])
df['age_enc'] = le1.transform(df['age'])
#
keys = le1.classes_
values = le1.transform(le1.classes_)
dictionary = dict(zip(keys, values))
print(dictionary)

because i wanna transcode my variables and create a new column - this way may be the simpiest

Igor
  • 9
  • 1