0

I have a df named value of size 567 and it has a column index as follows:

index
96.875
96.6796875
96.58203125
96.38671875
95.80078125
94.7265625
94.62890625
94.3359375
58.88671875
58.7890625
58.69140625
58.59375
58.49609375
58.3984375
58.30078125
58.203125

I also have 2 additional variables:

mu = 56.80877955613938

sigma= 17.78935620293665

What I want is to check the values in the index column. If the value is greater than, say, mu+3*sigma, a new column named alarm must be added to the value df and a value of 4 must be added.

I tried:

for i in value['index']:
    if (i >= mu+3*sigma):
        value['alarm'] = 4
    elif ((i < mu+3*sigma) and (i >= mu+2*sigma)):
        value['alarm'] = 3
    elif((i < mu+2*sigma) and (i >= mu+sigma)):
        value['alarm'] = 2
    elif ((i < mu+sigma) and (i >= mu)):
        value['alarm'] = 1

But it creates an alarm column and fills it completely with 1.

What is the mistake I am doing here?

Expected output:

index            alarm
96.875             3
96.6796875         3
96.58203125        3
96.38671875        3
95.80078125        3
94.7265625         3
94.62890625        3
94.3359375         3
58.88671875        1
58.7890625         1
58.69140625        1
58.59375           1
58.49609375        1
58.3984375         1
58.30078125        1
58.203125          1
some_programmer
  • 3,268
  • 4
  • 24
  • 59

1 Answers1

1

If you have multiple conditions, you don't want to loop through your dataframe and use if, elif, else. A better solution would be to use np.select where we define conditions and based on those conditions we define choices:

conditions=[
    value['index'] >= mu+3*sigma,
    (value['index'] < mu+3*sigma) & (value['index'] >= mu+2*sigma),
    (value['index'] < mu+2*sigma) & (value['index'] >= mu+sigma),    
]

choices = [4, 3, 2]

value['alarm'] = np.select(conditions, choices, default=1)
value

           alarm
index           
96.875000      3
96.679688      3
96.582031      3
96.386719      3
95.800781      3
94.726562      3
94.628906      3
94.335938      3
58.886719      1
58.789062      1
58.691406      1
58.593750      1
58.496094      1
58.398438      1
58.300781      1
58.203125      1

If you have 10 min time, here's a good post by CS95 explaining why looping over a dataframe is bad practice.

some_programmer
  • 3,268
  • 4
  • 24
  • 59
Erfan
  • 40,971
  • 8
  • 66
  • 78
  • This works only when I did `value['index']` instead of `value.index`. any idea why does the latter one doesn't work, but the former one does? – some_programmer Jun 25 '19 at 11:18
  • 1
    I thought your `index` column was actually the index of your dataframe. With `value['index']` we call the column called _index_. This is the reason you should never call a column _index_ since it brings up confusion. @Junkrat – Erfan Jun 25 '19 at 22:11
  • Oh alright.. I actually didn't name my column like that. Just before I do these steps, I use `.reset_index()` and that creates the `index` column. I understand it now. thanks – some_programmer Jun 25 '19 at 22:45