-2

I've got a function f(a, b) that is taking two pandas dataframes to apply different formulas to the values like this:

def f(a, b):
   if a > 0 and b > 0:
      return a + b
   elif a > 0 and b < 0:
      return a - b
   elif a < 0 and b > 0:
      return a * b
   elif a < 0 and b < 0:
      return a / b
   else:
      print('bad')

dfa = pd.DataFrame({'a':[1, 1]})
dfb = pd.DataFrame({'b':[2, 2]})
f(dfa,dfb)

The issue here in particular is, that I'd need the current value that is processed in the function to branch, however, using the and operator leads to this below.

"The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()"

and using & is leading to a

"cannot compare [type] array with a scalar of type [bool]"

Edit:

Considering the answers, I starting to realize that my minimal example might not transport my intention very well.

def f(a, b):
  if a > 0 and b > 0:
    X = operationA()
  elif a > 0 and b < 0:
    X = operationB()
  elif a < 0 and b < 0:
    X = operationC()
  elif a < 0 and b < 0:
    X = operationD()
  else:
    print('bad')

  Y = operationY()
  return X, Y

# both dataframes are part of a training example label example = (a, b)
df_label_partA = pd.DataFrame({'a':[1, 1, -1, -1]})
df_label_partB = pd.DataFrame({'b':[1, -1, 1, -1]})
f(df_label_partA, df_label_partB)

the data frames can't be considered separately as each is part of a list of labels (basically a tuple split up into 2 lists)

Markus
  • 5
  • 4
  • 1
    Always share the entire error message. I **strongly recommend** reading the Pandas docs. – AMC Feb 08 '20 at 03:20
  • https://stackoverflow.com/questions/21415661/logical-operators-for-boolean-indexing-in-pandas, https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o – AMC Feb 08 '20 at 03:21
  • No To clarify it for _you_: The issue is the comparison – Markus Feb 08 '20 at 12:29
  • Isn’t that what the two linked questions are about? – AMC Feb 08 '20 at 13:15

2 Answers2

0

Try this:

pd.concat([dfa,dfb], axis=1).apply(lambda x: f(*x), axis=1)

Outputs:

0    3
1    3
dtype: int64
halfer
  • 19,824
  • 17
  • 99
  • 186
Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34
  • how does the lambda call handle 2 return values? I've edited my question – Markus Feb 08 '20 at 00:48
  • _how does the lambda call handle 2 return values?_ It doesn't, the output is a Series, the leftmost numbers are the index. – AMC Feb 08 '20 at 03:24
  • So ```pd.concat``` will make both columns side by side adjacent. Then ```*x``` will just unpack them column-wise (hence ```axis=1``` in ```apply```). – Grzegorz Skibinski Feb 08 '20 at 07:54
  • Check for instance: ```pd.concat([dfa,dfb], axis=1).apply(lambda x: list([*x]), axis=1)``` - ```*``` in this case unpacks the values – Grzegorz Skibinski Feb 08 '20 at 08:03
  • Thank you, for your clarification. This approach appears to be very promising. My last issue is, however, that the call returns a list of tuples in the form `(x, y)` for each line. Do you happen to know how to convert it back into output variables, like `X, Y = concat(...)` or do you think a new thread would be good? Thank you again. – Markus Feb 08 '20 at 13:16
  • Hm, not sure, if I understand what you mean? You can always decompose it by referring to particular column i.e. if ```df=pd.concat(...)``` then ```x=df['a']``` and ```y=df['b']``` – Grzegorz Skibinski Feb 08 '20 at 14:12
  • [Screenshot](https://i.stack.imgur.com/bKfYw.png) Here's a screenshot of the output. _r_ basically contains sort of a DataFrame with a single column and this column is containing a tupel of 4 values. I'd rather return series though for further use. Thank you for your help. Really appreciate it. – Markus Feb 08 '20 at 15:29
  • Ok- this should work for you: ```a,b,c,d=verify(...)[...].apply(pd.Series).to_numpy().T``` – Grzegorz Skibinski Feb 08 '20 at 19:59
  • Thank you. It sort of works now. I'm receiving an error, though at runtime: _D:\anaconda\lib\site-packages\ipykernel_launcher.py:15: RuntimeWarning: invalid value encountered in double_scalars from ipykernel import kernelapp as app_ Didn't receive the message before adding `.apply(pd.Series).to_numpy().T` – Markus Feb 08 '20 at 20:53
  • It would be odd, assuming data hasn't changed in between, but maybe it's something around this: https://stackoverflow.com/a/37784106/11610186 – Grzegorz Skibinski Feb 08 '20 at 21:07
  • 1
    turns out, there was one single value in 160 000 that caused this issue. It's working now, although a bit slow. But it gets the job done. Thank you for your help – Markus Feb 09 '20 at 00:40
0

You can try this

def f(a, b):
   if all(a > 0) and all(b > 0):
      return dfa.a + dfb.b
   elif all(a > 0) and all(b < 0):
      return dfa.a - dfb.b
   elif all(a < 0) and all(b > 0):
      return dfa.a * dfb.b
   elif all(a < 0) and all(b < 0):
      return dfa.a / dfb.b
   else:
      print('bad')

dfa = pd.DataFrame({'a':[1, 1]})
dfb = pd.DataFrame({'b':[2, 2]})
f(dfa,dfb)

output

0    3
1    3
dtype: int64
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
  • thanks for the reply. If I understand all(..) correct it's true when all entries are meeting the condition. However, I definitely know that the values are either positive, negative, or 0, and a is linked to b so I need the values of the same index. a and b are of equal size, though. I edited my question – Markus Feb 08 '20 at 00:46