how to compute a new column based on the values of other columns in pandas - python

Question

Let's say my data frame contains these data:

>>> df = pd.DataFrame({'a':['l1','l2','l1','l2','l1','l2'],
                       'b':['1','2','2','1','2','2']})
>>> df
    a       b
0  l1       1
1  l2       2
2  l1       2
3  l2       1
4  l1       2
5  l2       2

l1 should correspond to 1 whereas l2 should correspond to 2. I'd like to create a new column 'c' such that, for each row, c = 1 if a = l1 and b = 1 (or a = l2 and b = 2). If a = l1 and b = 2 (or a = l2 and b = 1) then c = 0.

The resulting data frame should look like this:

  a         b   c
0  l1       1   1
1  l2       2   1
2  l1       2   0
3  l2       1   0
4  l1       2   0
5  l2       2   1

My data frame is very large so I'm really looking for the most efficient way to do this using pandas.

chlunde · Accepted Answer · 2013-08-27T21:12:42.273

8

df = pd.DataFrame({'a': numpy.random.choice(['l1', 'l2'], 1000000),
                   'b': numpy.random.choice(['1', '2'], 1000000)})

A fast solution assuming only two distinct values:

%timeit df['c'] = ((df.a == 'l1') == (df.b == '1')).astype(int)

10 loops, best of 3: 178 ms per loop

@Viktor Kerkes:

%timeit df['c'] = (df.a.str[-1] == df.b).astype(int)

1 loops, best of 3: 412 ms per loop

@user1470788:

%timeit df['c'] = (((df['a'] == 'l1')&(df['b']=='1'))|((df['a'] == 'l2')&(df['b']=='2'))).astype(int)

1 loops, best of 3: 363 ms per loop

@herrfz

%timeit df['c'] = (df.a.apply(lambda x: x[1:])==df.b).astype(int)

1 loops, best of 3: 387 ms per loop

edited Aug 27 '13 at 21:12

answered Aug 27 '13 at 18:54

chlunde

1,547
11
11

2

Interesting, however it your solution is significantly less general. What's interesting here is how bad `str[1]` methods compared to a simple lambda. – Andy Hayden Aug 27 '13 at 19:09
You did not test for when `(df.a == 'l2') == (df.b == '2')`. – Steven Rumbalski Aug 27 '13 at 20:11
@StevenRumbalski I assume the example input is complete, and `a` only has values `l1` or `l2` and `b` only `'1'` or `'2'`. If `a != 'l1'`, it must be `'l2'`. – chlunde Aug 27 '13 at 20:20
@user2716201: I don't believe the OP's sentence "Let's say my data frame contains these data" supports that assumption. I think you should to clearly call out that assumption in your answer. – Steven Rumbalski Aug 27 '13 at 20:40
1

@StevenRumbalski: good point, updated description. But I would not be surprised if in the real data the values might be 'dog' and 'cat', in which case the substring solutions would break... I would also like to point out that OP asked for the most efficient solution. – chlunde Aug 27 '13 at 21:17

score 6 · Answer 2 · answered Aug 27 '13 at 18:32

6

You can also use the string methods.

df['c'] = (df.a.str[-1] == df.b).astype(int)

answered Aug 27 '13 at 18:32

Viktor Kerkez

45,070
12
104
85

herrfz · Answer 3 · 2013-08-27T18:36:05.747

1

df['c'] = (df.a.apply(lambda x: x[1:])==df.b).astype(int)

edited Aug 27 '13 at 18:36

answered Aug 27 '13 at 18:25

herrfz

4,814
4
26
37

score -1 · Answer 4 · answered Aug 27 '13 at 18:32

You can just use logical operators. I'm not sure why you're using strings of 1 and 2 rather than ints, but here's a solution. The astype at the end converts it from boolean to 0's and 1's.

df['c'] = (((df['a'] == 'l1')&(df['b']=='1'))|((df['a'] == 'l2')&(df['b']=='2'))).astype(int)

how to compute a new column based on the values of other columns in pandas - python

4 Answers4

Linked