43

If I have a dataframe df with column x and want to create column y based on values of x using this in pseudo code:

if df['x'] < -2 then df['y'] = 1 
else if df['x'] > 2 then df['y'] = -1 
else df['y'] = 0

How would I achieve this? I assume np.where is the best way to do this but not sure how to code it correctly.

tdy
  • 36,675
  • 19
  • 86
  • 83
azuric
  • 2,679
  • 7
  • 29
  • 44
  • 2
    Note, there is going to be an additional way to do this with the assign() method in pandas 16.0 (due any day now?) similar to dplyr mutate: http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#dataframe-assign – JohnE Mar 06 '15 at 13:50
  • 3
    See [this answer](https://stackoverflow.com/a/19913845) using `np.where` for two choices and `np. select` for more choices. – Paul Rougieux May 14 '19 at 10:01

5 Answers5

68

One simple method would be to assign the default value first and then perform 2 loc calls:

In [66]:

df = pd.DataFrame({'x':[0,-3,5,-1,1]})
df
Out[66]:
   x
0  0
1 -3
2  5
3 -1
4  1

In [69]:

df['y'] = 0
df.loc[df['x'] < -2, 'y'] = 1
df.loc[df['x'] > 2, 'y'] = -1
df
Out[69]:
   x  y
0  0  0
1 -3  1
2  5 -1
3 -1  0
4  1  0

If you wanted to use np.where then you could do it with a nested np.where:

In [77]:

df['y'] = np.where(df['x'] < -2 , 1, np.where(df['x'] > 2, -1, 0))
df
Out[77]:
   x  y
0  0  0
1 -3  1
2  5 -1
3 -1  0
4  1  0

So here we define the first condition as where x is less than -2, return 1, then we have another np.where which tests the other condition where x is greater than 2 and returns -1, otherwise return 0

timings

In [79]:

%timeit df['y'] = np.where(df['x'] < -2 , 1, np.where(df['x'] > 2, -1, 0))

1000 loops, best of 3: 1.79 ms per loop

In [81]:

%%timeit
df['y'] = 0
df.loc[df['x'] < -2, 'y'] = 1
df.loc[df['x'] > 2, 'y'] = -1

100 loops, best of 3: 3.27 ms per loop

So for this sample dataset the np.where method is twice as fast

EdChum
  • 376,765
  • 198
  • 813
  • 562
9

Use np.select for multiple conditions

np.select(condlist, choicelist, default=0)

  • Return elements in choicelist depending on the corresponding condition in condlist.
  • The default element is used when all conditions evaluate to False.
condlist = [
    df['x'] < -2,
    df['x'] > 2,
]
choicelist = [
    1,
    -1,
]
df['y'] = np.select(condlist, choicelist, default=0)

np.select is much more readable than a nested np.where but just as fast:

df = pd.DataFrame({'x': np.random.randint(-5, 5, size=n)})

tdy
  • 36,675
  • 19
  • 86
  • 83
5

This is a good use case for pd.cut where you define ranges and based on those ranges you can assign labels:

df['y'] = pd.cut(df['x'], [-np.inf, -2, 2, np.inf], labels=[1, 0, -1], right=False)

Output

   x  y
0  0  0
1 -3  1
2  5 -1
3 -1  0
4  1  0
Erfan
  • 40,971
  • 8
  • 66
  • 78
0

set fixed value to 'c2' where the condition is met

 df.loc[df['c1'] == 'Value', 'c2'] = 10
Hasan Zafari
  • 355
  • 2
  • 6
-1

You can do it easily using the index and 2 loc calls:

df = pd.DataFrame({'x':[0,-3,5,-1,1]})

df

   x
0  0
1 -3
2  5
3 -1
4  1
    
df['y'] = 0
idx_1 = df.loc[df['x'] < -2, 'y'].index
idx_2 = df.loc[df['x'] >  2, 'y'].index
df.loc[idx_1, 'y'] =  1
df.loc[idx_2, 'y'] = -1

df

   x  y
0  0  0
1 -3  1
2  5 -1
3 -1  0
4  1  0
  • 1
    Isn't this just a more verbose and probably slower way to write the [existing accepted answer from 3 years ago](https://stackoverflow.com/a/28896853/6243352)? There's no need for another layer of `.loc` calls which then need to be `.index`ed. – ggorlen Jan 28 '23 at 21:40
  • Because it works, is the way I use it, is updated, probably easier to understand. – Alexander Martins Jan 30 '23 at 13:08