1

I have a df like this,

 ID Machine 17-Dec  18-Jan  18-Feb  18-Mar  18-Apr  18-May
160 Car     348     280     274     265     180     224
163 Var     68248   72013   55441   64505   71097   78006
165 Assus   1337    1279    1536    1461    1555    1700
215 Owen    118     147     104     143     115     153

I calculates the Mean and Std. Dev like this,

df['Avg'] = np.mean(all_np_values, axis=1)

df['Std.Dev'] = np.std(all_np_values, axis=1)

Then I get the following data frame.

ID  Machine 17-Dec  18-Jan  18-Feb  18-Mar  18-Apr  18-May  Mean     Std.Dev
160 Car     348     280     274     265     180     224     261.83   51.70
163 Var     68248   72013   55441   64505   71097   78006   68218.33 7018.24
165 Assus   1337    1279    1536    1461    1555    1700    1478     140.44
215 Owen    118     147     104     143     115     153     130      18.40

Now, I want to have a final dataframe that looks like below, which I would like to look at MAY 18 and say yes or no based on its value Above or Below 2 standard deviation.

ID  Machine 17-Dec  18-Jan  18-Feb  18-Mar  18-Apr  18-May  Mean     Std.Dev    Above   Below
160 Car     348     280     274     265     180     224     261.83   51.70      No      No
163 Var     68248   72013   55441   64505   71097   78006   68218.33 7018.24    No      No
165 Assus   1337    1279    1536    1461    1555    1700    1478     140.44     No      No
215 Owen    118     147     104     143     115     153     130      18.40      No      No

I tried to do the following,

   for value in df['18-May']:
    if value > (df['Avg'] + 2 * df['Std.Dev']):
        df['Above'] = 'Yes'
    else:
        df['Above'] = 'No'

This gives me an error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I understand the error after reading some older posts. My conclusion is, it returns bool values for comparison.

Not sure, how to mask in creating a new df column to create that 'Yes' and 'No' in my 'Above' or 'Below' column. How can I add that into my code above?

Any thoughts would be helpful.

user9431057
  • 1,203
  • 1
  • 14
  • 28
  • 3
    Use `df['Above'] = np.where((df['Avg'] + 2 * df['Std.Dev']) > df['18-May'], 'Yes', 'No')` – Zero Aug 21 '18 at 18:03
  • @Zero Thanks, this seems to be working, is there a way to without using `np.where`? – user9431057 Aug 21 '18 at 18:06
  • Why are you against using `np.where`? – rahlf23 Aug 21 '18 at 18:14
  • @rahlf23 well, I am not against `np.where`, it seems to be great and working, but wondering what is wrong with traditional way using `for` loops. As I read more, when `pandas` is more usable with `np.where` ;) – user9431057 Aug 21 '18 at 18:16
  • for loops are going to be considerably slower as the size of your dataframe increases – rahlf23 Aug 21 '18 at 18:17
  • @rahlf23 Yeah, the more I read, I am finding more that `for` loops can be computationally expensive. `Pandas` is whole different level ;) – user9431057 Aug 21 '18 at 18:23

0 Answers0