Python equivalent to dplyr's ifelse

Question

I'm converting code from R to Python and am looking for some help with mutating a new column based on other columns, using dfply syntax/piping

In this example, I want to subtract 2 from col1 if col2 is 'c', otherwise add 4

import pandas as pd
import numpy as np
from dfply import *

col1 = [1,2,3,4,5]
col2 = ['a', 'b', 'c', 'd', 'e']

df = pd.DataFrame(data = {'col1': col1, 'col2': col2})

in R I would do:

df_new <- df %>% 
  mutate(newCol = ifelse(col2 == 'c', col1 - 2, col1 + 4))

but Python doesn't seem to like this:

new_df = (df >>
    mutate(newCol = np.where(X.col2 == 'c', X.col1 - 2, X.col1 + 4)))

I get an error of "invalid __array_struct__"

Note that this works fine:

new_df = (df >>
    mutate(newCol = X.col1 - 2))

score 4 · Accepted Answer · answered Jun 18 '19 at 18:23

I will use apply/lambda function. X is the dataframe row and axis=1 means apply the lambda function per column.

df['newCol'] = df.apply(lambda X: X.col1 - 2 if X.col2 == 'c' else X.col1 + 4, axis=1)
df

  col1 col2 newCol
0   1   a   5
1   2   b   6
2   3   c   1
3   4   d   8
4   5   e   9

olinox14 · Answer 2 · 2019-06-18T18:25:17.043

0

The python equivalent here would be a inline if else expression (or ternary operator):

ifelse(col2 == 'c', col1 - 2, col1 + 4)

Would then become

col1 - 2 if col2 == 'c' else col1 + 4

edited Jun 18 '19 at 18:25

answered Jun 18 '19 at 18:11

olinox14

6,177
2
22
39

Easy enough, but `new_df = (df >> mutate(newCol = col1 - 2 if col2 == 'c' else col1 + 4))` gives me the error "can only concatenate list (not "int") to list – CoolGuyHasChillDay Jun 18 '19 at 18:19
Sorry, I wrote a little too fast, and I miss time to find the good way of doing it. The newly posted answer may be better, though... – olinox14 Jun 18 '19 at 18:26

Python equivalent to dplyr's ifelse

2 Answers2