Creating a new column based on if-elif-else condition

Question

I have a DataFrame df:

I want to create a new column based on the following criteria:

if row A == B: 0

if rowA > B: 1

if row A < B: -1

so given the above table, it should be:

    A    B    C
a   2    2    0
b   3    1    1
c   1    3   -1

For typical if else cases I do np.where(df.A > df.B, 1, -1), does pandas provide a special syntax for solving my problem with one step (without the necessity of creating 3 new columns and then combining the result)?

You could just define a function and pass this to `apply` and set `axis=1` would work, not sure I can think of an operation that would give you what you want — EdChum, Feb 11 '14 at 12:58
Your solution imply creating 3 columns and combining them into 1 column, or you have something different in mind? — nutship, Feb 11 '14 at 13:00
You keep saying "creating 3 columns", but I'm not sure what you're referring to. — DSM, Feb 11 '14 at 13:01
@DSM has answered this question but I meant something like `df['C']=df.apply(myFunc(row), axis=1)` where myFunc does what you want, this does not involve creating '3 columns' — EdChum, Feb 11 '14 at 13:05
Possible duplicate of [Pandas conditional creation of a series/dataframe column](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) — Georgy, May 02 '18 at 17:34

Zelazny7 · Accepted Answer · 2020-11-11T12:29:19.327

251

To formalize some of the approaches laid out above:

Create a function that operates on the rows of your dataframe like so:

def f(row):
    if row['A'] == row['B']:
        val = 0
    elif row['A'] > row['B']:
        val = 1
    else:
        val = -1
    return val

Then apply it to your dataframe passing in the axis=1 option:

In [1]: df['C'] = df.apply(f, axis=1)

In [2]: df
Out[2]:
   A  B  C
a  2  2  0
b  3  1  1
c  1  3 -1

Of course, this is not vectorized so performance may not be as good when scaled to a large number of records. Still, I think it is much more readable. Especially coming from a SAS background.

Edit

Here is the vectorized version

df['C'] = np.where(
    df['A'] == df['B'], 0, np.where(
    df['A'] >  df['B'], 1, -1))

edited Nov 11 '20 at 12:29

answered Feb 11 '14 at 20:04

Zelazny7

39,946
18
70
84

5

What if I want to pass another parameter along with row in the function? If I do, it says row not defined.. – prashanth manohar Mar 22 '17 at 10:29
3

You have to use the `args` parameter of the `.apply` function: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html – Zelazny7 Mar 22 '17 at 13:31
1

I'm an old SAS user learning Python, and there's definitely a learning curve! :-) For example, the above code could be written in SAS as: `data df; set df; if A=B then C=0; else if A>B then C=1; else C=-1; run;` Very elegant and simple. – RobertF Feb 22 '19 at 16:24
thanks for the answer. However, I could not understand why `axis=1` is required. Without passing `axis=1` the row had only 1st column. – Jitendra Apr 26 '19 at 13:19
@Zelazny7 could you please give a vectorized version? – Averell Nov 11 '20 at 04:39
For the unvectorized, your function contains arguments - and so it would be: df['c'] = df(lambda row: f(row), axis=1) – Deskjokey Jan 12 '22 at 06:48

Brian · Answer 2 · 2020-03-05T03:06:22.960

89

df.loc[df['A'] == df['B'], 'C'] = 0
df.loc[df['A'] > df['B'], 'C'] = 1
df.loc[df['A'] < df['B'], 'C'] = -1

Easy to solve using indexing. The first line of code reads like so, if column A is equal to column B then create and set column C equal to 0.

edited Mar 05 '20 at 03:06

answered Oct 17 '16 at 16:40

Brian

2,163
1
14
26

score 18 · Answer 3 · answered Feb 11 '14 at 13:02

18

For this particular relationship, you could use np.sign:

>>> df["C"] = np.sign(df.A - df.B)
>>> df
   A  B  C
a  2  2  0
b  3  1  1
c  1  3 -1

answered Feb 11 '14 at 13:02

DSM

342,061
65
592
494

score 12 · Answer 4 · answered Nov 30 '20 at 20:57

When you have multiple if conditions, numpy.select is the way to go:

In [4102]: import numpy as np
In [4098]: conditions = [df.A.eq(df.B), df.A.gt(df.B), df.A.lt(df.B)]
In [4096]: choices = [0, 1, -1]

In [4100]: df['C'] = np.select(conditions, choices)

In [4101]: df
Out[4101]: 
   A  B  C
a  2  2  0
b  3  1  1
c  1  3 -1

score 8 · Answer 5 · answered Sep 12 '18 at 20:23

Lets say above one is your original dataframe and you want to add a new column 'old'

If age greater than 50 then we consider as older=yes otherwise False

step 1: Get the indexes of rows whose age greater than 50

row_indexes=df[df['age']>=50].index

step 2: Using .loc we can assign a new value to column

df.loc[row_indexes,'elderly']="yes"

same for age below less than 50

row_indexes=df[df['age']<50].index

df[row_indexes,'elderly']="no"

Mykola Zotko · Answer 6 · 2021-09-20T21:12:39.313

4

You can use the method mask:

df['C'] = np.nan
df['C'] = df['C'].mask(df.A == df.B, 0).mask(df.A > df.B, 1).mask(df.A < df.B, -1)

edited Sep 20 '21 at 21:12

answered Sep 20 '21 at 20:48

Mykola Zotko

15,583
3
71
73

score 0 · Answer 7 · answered May 27 '23 at 12:32

0

a one liner solution with list comprehension and zip() is also

df.loc[:,'C']= [0 if d1==d2 else 1 if d1>d2 else -1 for d1,d2 in zip(df.A,df.B)]

which returns the desired output

answered May 27 '23 at 12:32

Stefano Verugi

101
1
5

Creating a new column based on if-elif-else condition

7 Answers7

Linked

Related