0

I have a column "PreHeat" and i would like to create a new column "PreHeat_Outlier_TestX", where will be written, if the value is an outlier or not (True, False).

I can manage it with one condition:

df['PreHeat_Outlier_TestX'] = (df['PreHeat'] > df['PreHeat'].quantile(0.75))

But when i tried to use OR or if, ifelse i have got the following error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

if (df['PreHeat'] > df['PreHeat'].quantile(0.75)):
    df['PreHeat_Outlier_TestX'] = True
elif (df['PreHeat'] < df['PreHeat'].quantile(0.25)):
    df['PreHeat_Outlier_TestX'] = True
else: 
    df['PreHeat_Outlier_TestX'] = False


if (df['PreHeat'] > df['PreHeat'].quantile(0.75)) or (df['PreHeat'] < df['PreHeat'].quantile(0.25)):
   df['PreHeat_Outlier_TestX'] = True
else:
   df['PreHeat_Outlier_TestX'] = False

I am not sure, what is wrong with the code. Could somebody help me?

Pit_66
  • 31
  • 3
  • following `if` should always be a single boolean `True` or `False`. You passed a series, which cannot be cast to a boolean. checkout `np.select`. – Quang Hoang Oct 18 '20 at 19:00

1 Answers1

0

following if should always be a single boolean True or False. You passed a series, which cannot be cast to a boolean. checkout np.select

Similarly with or, also see this question

What you are trying to do with

if (df['PreHeat'] > df['PreHeat'].quantile(0.75)):
    df['PreHeat_Outlier_TestX'] = True
elif (df['PreHeat'] < df['PreHeat'].quantile(0.25)):
    df['PreHeat_Outlier_TestX'] = True
else: 
    df['PreHeat_Outlier_TestX'] = False

should be done with

df['PreHeat_Outlier_TestX'] = ~df['PreHeat'].between(df['PreHeat'].quantile(0.25), df['PreHeat'].quantile(0.75))
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74