1

I have a vector that I want to apply a pearson correlation to all rows of a pandas data frame. I am trying the following:

df.apply(apply_func, axis=1, args=(np.array([1,2,3])), raw=True)

Apply func simply takes two numpy arrays and calculates the correlation

def apply_func(v1, v2):
     #do stuff

However I get the following error when I try to run this

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I've set breakpoints in apply_func and I never get inside it. I'm sure I'm using this structure incorrectly but I'm not sure what it is. I would think that each row of df would be passed to apply_func as the first positional argument, and whatever is in args would take up the rest. Is this not correct?

EDIT I have created a simple example below, in this example the apply_func function should just add the two vectors. Still creates the same errors

data = {'k1': [1, 2, 3], 'k2': [4, 5, 6], 'k3': [7, 8, 9]}
df = pd.DataFrame(data)
def apply_func(v1, v2):
    return v1 + v2
df.apply(apply_func, axis=1, args=(np.array([1,2,3])), raw=True)
sedavidw
  • 11,116
  • 13
  • 61
  • 95

1 Answers1

1

So was able to solve my own question by finding the following post

python pandas: apply a function with arguments to a series. Update

My particular situation produced a different error (no idea why) but their solution worked. By changing

args=(np.array([1,2,3]))

to

args=(np.array([1,2,3]),)  #<-- NOTE THE COMMA

I ensure that args is a tuple which is what the apply function is expecting and I get the result I was expecting

Community
  • 1
  • 1
sedavidw
  • 11,116
  • 13
  • 61
  • 95