Pandas return value from multiple columns if equal to value in another column

Question

I have a Pandas dataframe like this:

  A      B        C         D
0 month   month+1 quarter+1 season+1
1 season  month+5 quarter+3 season+2
2 day     month+1 quarter+2 season+1
3 year    month+3 quarter+4 season+2
4 quarter month+2 quarter+1 season+1
5 month   month+4 quarter+1 season+2

I would like to insert a new column called 'E' based on several IF conditions. If column 'A' equals 'month' then return values in 'B', if column 'A' equals 'quarter' then return values in 'C', if column 'A' equals 'season' then return values in 'D', and if not then return values in column 'A'

  A      B        C         D        E
0 month   month+1 quarter+1 season+1 month+1
1 season  month+5 quarter+3 season+2 season+2
2 day     month+1 quarter+2 season+1 day
3 year    month+3 quarter+4 season+2 year
4 quarter month+2 quarter+1 season+1 quarter+1
5 month   month+4 quarter+1 season+2 month+4

I am having trouble doing this. I have tried playing around with a function but it did not work. See my attempt:

def f(row):
    if row['A'] == 'month':
        val = ['B']
    elif row['A'] == 'quarter':
        val = ['C']
    elif row['A'] == 'season':
        val = ['D']
    else:
        val = ['A']
    return val

df['E'] = df.apply(f, axis=1)

EDITED: to change the last else to column 'A'

ansev · Answer 1 · 2020-01-14T11:36:35.820

Frist, I recommend you see: When should I want to use apply() in my code.

I would use Series.replace

df['E'] = df['A'].replace(['month','quarter','season'],
                          [df['B'], df['C'], df['D']])

or numpy.select

cond = [df['A'].eq('month'), df['A'].eq('quarter'), df['A'].eq('season')]
values= [df['B'], df['C'], df['D']]
df['E']=np.select(cond,values,default=df['A'])

  A      B        C         D        E
0 month   month+1 quarter+1 season+1 month+1
1 season  month+5 quarter+3 season+2 season+2
2 day     month+1 quarter+2 season+1 day
3 year    month+3 quarter+4 season+2 year
4 quarter month+2 quarter+1 season+1 quarter+1
5 month   month+4 quarter+1 season+2 month+4

score 3 · Accepted Answer · edited Jan 14 '20 at 11:21

3

Just use np.select

c1 = df['A'] == 'month'
c2 = df['A'] == 'quarter'
c3 = df['A'] == 'season'

df['E'] = np.select([c1, c2, c3], [df['B'], df['C'], df['D']], df['A'])

Out[271]:
         A        B          C         D          E
0    month  month+1  quarter+1  season+1    month+1
1   season  month+5  quarter+3  season+2   season+2
2      day  month+1  quarter+2  season+1        day
3     year  month+3  quarter+4  season+2       year
4  quarter  month+2  quarter+1  season+1  quarter+1
5    month  month+4  quarter+1  season+2    month+4

edited Jan 14 '20 at 11:21

ansev

30,322
5
17
31

answered Jan 14 '20 at 10:42

Andy L.

24,909
4
17
29

@ansev: He mentioned that his code doesn't work. His desired ouput is what he wants. My output matches his desired output. You solution is actually wrong because you base on his incomplete codes – Andy L. Jan 14 '20 at 11:03
@ansev: I quote OP words : `I have tried playing around with a function but it did not work. See my attempt`. HIs description: `..... if not then return values in column 'A'.....` – Andy L. Jan 14 '20 at 11:04
1

True, he wrote else ['D'] but he said that by default it should be 'A', I didn't see it, +1 – ansev Jan 14 '20 at 11:06
1

I would use `df['A']` instead `df.A`, because OP could try `df.T` – ansev Jan 14 '20 at 11:19

Jimmar · Answer 3 · 2020-01-14T10:51:18.920

1

You probably need to fix your code like this:

def f(row):
    if row['A'] == 'month':
        val = row['B']
    elif row['A'] == 'quarter':
        val = row['C']
    elif row['A'] == 'season':
        val = row['D']
    else:
        val = row['D']
    return val

df['E'] = df.apply(f, axis=1)

note: you forgot to include row

val = ['B'] # before
val = row['B'] # after

Edit: This is just to point out the problem in the code, for better approaches check out the other answers related to the usage of numpy.select

edited Jan 14 '20 at 10:51

answered Jan 14 '20 at 10:35

Jimmar

4,194
2
28
43

I just showed what was wrong in the code, wasn't trying to optimize – Jimmar Jan 14 '20 at 10:49

Pandas return value from multiple columns if equal to value in another column

3 Answers3