0

I'm trying to create a new feature using

df_transactions['emome'] = df_transactions['emome'].apply(lambda x: 1 if df_transactions['plan_list_price'] ==0 & df_transactions['actual_amount_paid'] > 0 else 0).astype(int)

But it raises error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How can I create a new column that returns 1 when plan_list_price is 0 and actual_amount_paid is >0 else 0?

I would like to still use pandas apply.

Chia Yi
  • 562
  • 2
  • 7
  • 21
  • "I would like to still use pandas apply." why? – cs95 Jan 26 '18 at 11:20
  • Because I've met this problem few times before and I want to learn the proper way of using pandas apply. – Chia Yi Jan 26 '18 at 11:21
  • The proper way of using apply... is to not use it at all ;) Also, the reason is because you used & when you should have used `and`. Don't use them interchangeably. `&` is logical AND _only_ in the context of dataframes. – cs95 Jan 26 '18 at 11:22
  • The problem is not apply per se. It is your misconception on how to use multiple logical conditions, for which there is a duplicate. – cs95 Jan 26 '18 at 11:23
  • I tried using and, it still return the same error – Chia Yi Jan 26 '18 at 11:23
  • dupliacte? where? – Chia Yi Jan 26 '18 at 11:24
  • It's not a 1:1 duplicate... but here it is: https://stackoverflow.com/questions/22591174/pandas-multiple-conditions-while-indexing-data-frame-unexpected-behavior – cs95 Jan 26 '18 at 11:25
  • Take note that in this statement:`df_transactions['plan_list_price'] ==0 & df_transactions['actual_amount_paid'] > 0`, the order of operator is such that python will evaluate it this way: `(df_transactions['plan_list_price'] ==0 & df_transactions['actual_amount_paid']) > 0` which is what gives you the error. – Aditya Santoso Apr 11 '19 at 02:21

2 Answers2

1

You are really close, but much better is vectorized solution without apply - get boolean mask and convert to int:

mask = (df_transactions['plan_list_price'] == 0) & 
       (df_transactions['actual_amount_paid'] > 0)
df_transactions['emome'] = mask.astype(int)

If really want slowier apply:

f = lambda x: 1 if x['plan_list_price'] ==0 and x['actual_amount_paid'] > 0 else 0
df_transactions['emome'] = df_transactions.apply(f, axis=1)

Sample:

df_transactions = pd.DataFrame({'A':list('abcdef'),
                                'plan_list_price':[0,0,0,5,5,0],
                                'actual_amount_paid':[-1,0,9,4,2,3]})


mask = (df_transactions['plan_list_price'] == 0) & \
       (df_transactions['actual_amount_paid'] > 0)
df_transactions['emome1'] = mask.astype(int)

f = lambda x: 1 if x['plan_list_price'] ==0 and x['actual_amount_paid'] > 0 else 0
df_transactions['emome2'] = df_transactions.apply(f, axis=1)
print (df_transactions)

   A  actual_amount_paid  plan_list_price  emome1  emome2
0  a                  -1                0       0       0
1  b                   0                0       0       0
2  c                   9                0       1       1
3  d                   4                5       0       0
4  e                   2                5       0       0
5  f                   3                0       1       1

Timings:

#[60000 rows]
df_transactions = pd.concat([df_transactions] * 10000, ignore_index=True)

In [201]: %timeit df_transactions['emome1'] = ((df_transactions['plan_list_price'] == 0) & (df_transactions['actual_amount_paid'] > 0)).astype(int)
1000 loops, best of 3: 971 µs per loop

In [202]: %timeit df_transactions['emome2'] = df_transactions.apply(lambda x: 1 if x['plan_list_price'] ==0 and x['actual_amount_paid'] > 0 else 0, axis=1)
1 loop, best of 3: 1.15 s per loop
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I would like to use df_transactions['emome'] = df_transactions['emome'].apply(xxx), how can i fill in the xxx part? – Chia Yi Jan 26 '18 at 11:20
0

A few issues:

  • On the right side of the equation, the new field (emome)is not created yet.
  • The lambda function is on x, not on df_transactions, which does not exist in this scope.
  • You need to specify axis since you are applying to each row (default is to each column).

From Doc:

axis : {0 or ‘index’, 1 or ‘columns’}, default 0 Axis along which the function is applied:

0 or ‘index’: apply function to each column. 1 or ‘columns’: apply function to each row.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

Jake
  • 1,550
  • 1
  • 11
  • 12