0

I have a pandas Series and a function that I want to apply to each element of the Series. The function have an additional argument too. So far so good: for example

python pandas: apply a function with arguments to a series. Update

What about if the argument varies by itself running over a given list? I had to face this problem in my code and I have found a straightforward solution but it is quite specific and (even worse) do not use the apply method.

Here is a toy model code:

a=pd.DataFrame({'x'=[1,2]})
t=[10,20]

I want to multiply elements in a['x'] by elements in t. Here the function is quite simple and len(t) matches with len(a['x'].index) so I could just do:

a['t']=t
a['x*t']=a['x']*a['t']

But what about if the function is more elaborate or the two lengths do not match?

What I would like is a command line like:

a['x'].apply(lambda x,y: x*y, arg=t)

The point is that this specific line exits with an error because the arg variable in that case will accept only a tuple of len=1. I do not see any 'place' to put the various element of t.

Community
  • 1
  • 1
user2988577
  • 3,997
  • 7
  • 21
  • 21
  • Can you give an example that actually shows what you're trying to do? As you say, the example you gave doesn't really have the problem you're trying to solve. – BrenBarn Jan 29 '14 at 20:30
  • Edited the question. I hope it is more clear now. – user2988577 Jan 29 '14 at 20:38
  • That is somewhat more clear, but I still don't understand how you want the values in `t` to be used. If `t` is of a different length than `a['x']`, what do you want to happen? How do you want the values in `t` to be matched up with the values in `a['x']`? – BrenBarn Jan 29 '14 at 20:44
  • Well I imagine a sort of broadcast of t in order to match the length of a['x']. Let's say len(t)=2 and len(a['x'])=4 then t[0] would operate with a['x'][0] and a['x'][2] while t[1] on a['x'][1] and a['x'][3] – user2988577 Jan 29 '14 at 20:49
  • That's an unusual broadcast rule, and not one that will be widely desired. So the Pandas API doesn't directly handle it for you. Your best bet would be to write a function that maps the `t` vector into a correctly-sized column in the data frame, using whatever mapping convention you'd like, and after that is created, *then* you can just use a simple `apply` or `map` or basic array function to operate on them. But you shouldn't want Pandas to support arbitrary ways of broadcasting elements. That interface would be so wide open it would necessitate that the data structure was meaningless. – ely Jan 29 '14 at 21:43

1 Answers1

4

What you're looking for is similar to what R calls "recycling", where operations on arrays of unequal length loops through the smaller array over and over as many times as needed to match the length of the longer array.

I'm not aware of any simple, built-in way to do this with numpy or pandas. What you can do is use np.tile to repeat your smaller array. Something like:

a.x*np.tile(t, len(a)/len(t))

This will only work if the longer array's length is a simple multiple of the shorter one's.

The behavior you want is somewhat unusual. Depending on what you're doing, there may be a better way to handle it. Relying on the values to match up in the desired way just by repetition is a little fragile. If you have some way to match up the values in each array that you want to multiply, you could use the .map method of Series to select the right "other value" to multiply each element of your Series with.

BrenBarn
  • 242,874
  • 37
  • 412
  • 384