2

I have a Pandas dataframe like this:

  A      B        C         D
0 month   month+1 quarter+1 season+1
1 season  month+5 quarter+3 season+2
2 day     month+1 quarter+2 season+1
3 year    month+3 quarter+4 season+2
4 quarter month+2 quarter+1 season+1
5 month   month+4 quarter+1 season+2

I would like to insert a new column called 'E' based on several IF conditions. If column 'A' equals 'month' then return values in 'B', if column 'A' equals 'quarter' then return values in 'C', if column 'A' equals 'season' then return values in 'D', and if not then return values in column 'A'

  A      B        C         D        E
0 month   month+1 quarter+1 season+1 month+1
1 season  month+5 quarter+3 season+2 season+2
2 day     month+1 quarter+2 season+1 day
3 year    month+3 quarter+4 season+2 year
4 quarter month+2 quarter+1 season+1 quarter+1
5 month   month+4 quarter+1 season+2 month+4

I am having trouble doing this. I have tried playing around with a function but it did not work. See my attempt:

def f(row):
    if row['A'] == 'month':
        val = ['B']
    elif row['A'] == 'quarter':
        val = ['C']
    elif row['A'] == 'season':
        val = ['D']
    else:
        val = ['A']
    return val

df['E'] = df.apply(f, axis=1)

EDITED: to change the last else to column 'A'

thor
  • 281
  • 1
  • 6
  • 13

3 Answers3

4

Frist, I recommend you see: When should I want to use apply() in my code.

I would use Series.replace

df['E'] = df['A'].replace(['month','quarter','season'],
                          [df['B'], df['C'], df['D']]) 

or numpy.select

cond = [df['A'].eq('month'), df['A'].eq('quarter'), df['A'].eq('season')]
values= [df['B'], df['C'], df['D']]
df['E']=np.select(cond,values,default=df['A'])

  A      B        C         D        E
0 month   month+1 quarter+1 season+1 month+1
1 season  month+5 quarter+3 season+2 season+2
2 day     month+1 quarter+2 season+1 day
3 year    month+3 quarter+4 season+2 year
4 quarter month+2 quarter+1 season+1 quarter+1
5 month   month+4 quarter+1 season+2 month+4
ansev
  • 30,322
  • 5
  • 17
  • 31
3

Just use np.select

c1 = df['A'] == 'month'
c2 = df['A'] == 'quarter'
c3 = df['A'] == 'season'

df['E'] = np.select([c1, c2, c3], [df['B'], df['C'], df['D']], df['A'])

Out[271]:
         A        B          C         D          E
0    month  month+1  quarter+1  season+1    month+1
1   season  month+5  quarter+3  season+2   season+2
2      day  month+1  quarter+2  season+1        day
3     year  month+3  quarter+4  season+2       year
4  quarter  month+2  quarter+1  season+1  quarter+1
5    month  month+4  quarter+1  season+2    month+4
ansev
  • 30,322
  • 5
  • 17
  • 31
Andy L.
  • 24,909
  • 4
  • 17
  • 29
  • @ansev: He mentioned that his code doesn't work. His desired ouput is what he wants. My output matches his desired output. You solution is actually wrong because you base on his incomplete codes – Andy L. Jan 14 '20 at 11:03
  • @ansev: I quote OP words : `I have tried playing around with a function but it did not work. See my attempt`. HIs description: `..... if not then return values in column 'A'.....` – Andy L. Jan 14 '20 at 11:04
  • 1
    True, he wrote else ['D'] but he said that by default it should be 'A', I didn't see it, +1 – ansev Jan 14 '20 at 11:06
  • 1
    I would use `df['A']` instead `df.A`, because OP could try `df.T` – ansev Jan 14 '20 at 11:19
1

You probably need to fix your code like this:

def f(row):
    if row['A'] == 'month':
        val = row['B']
    elif row['A'] == 'quarter':
        val = row['C']
    elif row['A'] == 'season':
        val = row['D']
    else:
        val = row['D']
    return val

df['E'] = df.apply(f, axis=1)

note: you forgot to include row

val = ['B'] # before
val = row['B'] # after

Edit: This is just to point out the problem in the code, for better approaches check out the other answers related to the usage of numpy.select

Jimmar
  • 4,194
  • 2
  • 28
  • 43