How to control which formula to apply in a function depending on the current value of two pandas dataframe arguments?

Question

I've got a function f(a, b) that is taking two pandas dataframes to apply different formulas to the values like this:

def f(a, b):
   if a > 0 and b > 0:
      return a + b
   elif a > 0 and b < 0:
      return a - b
   elif a < 0 and b > 0:
      return a * b
   elif a < 0 and b < 0:
      return a / b
   else:
      print('bad')

dfa = pd.DataFrame({'a':[1, 1]})
dfb = pd.DataFrame({'b':[2, 2]})
f(dfa,dfb)

The issue here in particular is, that I'd need the current value that is processed in the function to branch, however, using the and operator leads to this below.

"The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()"

and using & is leading to a

"cannot compare [type] array with a scalar of type [bool]"

Edit:

Considering the answers, I starting to realize that my minimal example might not transport my intention very well.

def f(a, b):
  if a > 0 and b > 0:
    X = operationA()
  elif a > 0 and b < 0:
    X = operationB()
  elif a < 0 and b < 0:
    X = operationC()
  elif a < 0 and b < 0:
    X = operationD()
  else:
    print('bad')

  Y = operationY()
  return X, Y

# both dataframes are part of a training example label example = (a, b)
df_label_partA = pd.DataFrame({'a':[1, 1, -1, -1]})
df_label_partB = pd.DataFrame({'b':[1, -1, 1, -1]})
f(df_label_partA, df_label_partB)

the data frames can't be considered separately as each is part of a list of labels (basically a tuple split up into 2 lists)

Always share the entire error message. I **strongly recommend** reading the Pandas docs. — AMC, Feb 08 '20 at 03:20
https://stackoverflow.com/questions/21415661/logical-operators-for-boolean-indexing-in-pandas, https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o — AMC, Feb 08 '20 at 03:21

score 0 · Accepted Answer · edited Mar 13 '23 at 15:07

0

Try this:

pd.concat([dfa,dfb], axis=1).apply(lambda x: f(*x), axis=1)

Outputs:

0    3
1    3
dtype: int64

edited Mar 13 '23 at 15:07

halfer

19,824
17
99
186

answered Feb 07 '20 at 20:31

Grzegorz Skibinski

12,624
2
11
34

how does the lambda call handle 2 return values? I've edited my question – Markus Feb 08 '20 at 00:48
_how does the lambda call handle 2 return values?_ It doesn't, the output is a Series, the leftmost numbers are the index. – AMC Feb 08 '20 at 03:24
So ```pd.concat``` will make both columns side by side adjacent. Then ```*x``` will just unpack them column-wise (hence ```axis=1``` in ```apply```). – Grzegorz Skibinski Feb 08 '20 at 07:54
Check for instance: ```pd.concat([dfa,dfb], axis=1).apply(lambda x: list([*x]), axis=1)``` - ```*``` in this case unpacks the values – Grzegorz Skibinski Feb 08 '20 at 08:03
Thank you, for your clarification. This approach appears to be very promising. My last issue is, however, that the call returns a list of tuples in the form `(x, y)` for each line. Do you happen to know how to convert it back into output variables, like `X, Y = concat(...)` or do you think a new thread would be good? Thank you again. – Markus Feb 08 '20 at 13:16
Hm, not sure, if I understand what you mean? You can always decompose it by referring to particular column i.e. if ```df=pd.concat(...)``` then ```x=df['a']``` and ```y=df['b']``` – Grzegorz Skibinski Feb 08 '20 at 14:12
[Screenshot](https://i.stack.imgur.com/bKfYw.png) Here's a screenshot of the output. _r_ basically contains sort of a DataFrame with a single column and this column is containing a tupel of 4 values. I'd rather return series though for further use. Thank you for your help. Really appreciate it. – Markus Feb 08 '20 at 15:29
Ok- this should work for you: ```a,b,c,d=verify(...)[...].apply(pd.Series).to_numpy().T``` – Grzegorz Skibinski Feb 08 '20 at 19:59
Thank you. It sort of works now. I'm receiving an error, though at runtime: _D:\anaconda\lib\site-packages\ipykernel_launcher.py:15: RuntimeWarning: invalid value encountered in double_scalars from ipykernel import kernelapp as app_ Didn't receive the message before adding `.apply(pd.Series).to_numpy().T` – Markus Feb 08 '20 at 20:53
It would be odd, assuming data hasn't changed in between, but maybe it's something around this: https://stackoverflow.com/a/37784106/11610186 – Grzegorz Skibinski Feb 08 '20 at 21:07
1

turns out, there was one single value in 160 000 that caused this issue. It's working now, although a bit slow. But it gets the job done. Thank you for your help – Markus Feb 09 '20 at 00:40

score 0 · Answer 2 · answered Feb 07 '20 at 20:34

0

You can try this

def f(a, b):
   if all(a > 0) and all(b > 0):
      return dfa.a + dfb.b
   elif all(a > 0) and all(b < 0):
      return dfa.a - dfb.b
   elif all(a < 0) and all(b > 0):
      return dfa.a * dfb.b
   elif all(a < 0) and all(b < 0):
      return dfa.a / dfb.b
   else:
      print('bad')

dfa = pd.DataFrame({'a':[1, 1]})
dfb = pd.DataFrame({'b':[2, 2]})
f(dfa,dfb)

output

0    3
1    3
dtype: int64

answered Feb 07 '20 at 20:34

Ch3steR

20,090
4
28
58

thanks for the reply. If I understand all(..) correct it's true when all entries are meeting the condition. However, I definitely know that the values are either positive, negative, or 0, and a is linked to b so I need the values of the same index. a and b are of equal size, though. I edited my question – Markus Feb 08 '20 at 00:46

How to control which formula to apply in a function depending on the current value of two pandas dataframe arguments?

2 Answers2