1

I would like to apply a function foo(df.column,df.index,df.current value) and end up with the same dataframe but each cells equal to the result of the function the fastest way possible.

def foo(dates, name, value):
    return black_box_function(dates, name, value)

I want dates to be the column of the cell, name to be the index and value to the content of the cell of the data frame.

I try to implement it as df.apply(foo(df['index'], df['column']) but doesn't work.

boboo
  • 105
  • 12
  • So you want to apply a function to the whole dataframe? – Dani Mesejo Jan 18 '19 at 21:44
  • @DanielMesejo yes – boboo Jan 18 '19 at 21:45
  • You could use some vectorization, it will depend on the function, please include the details. – Dani Mesejo Jan 18 '19 at 21:47
  • The more detail about what it is that you're trying to accomplish, the more we can help you achieve what you want. Right now, it is unclear. Can you provide examples of the function? Sample input and output? – PMende Jan 18 '19 at 22:49
  • @PMende just did – boboo Jan 18 '19 at 22:50
  • @JoePatten `.apply` is generally not more efficient than iterating over the rows using `.itertuples`, it essentially uses that underneath the hood anyway – juanpa.arrivillaga Jan 18 '19 at 22:52
  • What does some sample input and desired output look like? Feel free to change values if you need to avoid divulging information, so long as they still represent what you're trying to achieve. – PMende Jan 18 '19 at 22:52
  • @PMende I just want to end up with the same dataframe but all value equal to the output of the function which is a float, as the initial values of the dataframe. I don't have the implementation of the function I am calling though. – boboo Jan 18 '19 at 22:54
  • 1
    @Spenrose well if the function is designed to work on scalar values, then you really have no choice but to use something like `.apply`, `.applymap`, or manual looping. – juanpa.arrivillaga Jan 18 '19 at 22:55
  • @juanpa.arrivillaga how to specify which parameters on the apply function ? – boboo Jan 18 '19 at 23:00
  • What's wrong with the loop? – juanpa.arrivillaga Jan 18 '19 at 23:07
  • @juanpa.arrivillaga I edited the post to explain what I am doing and not working – boboo Jan 18 '19 at 23:10
  • @Spenrose why do you want to use apply? Why don't you just stick with your current solution? – juanpa.arrivillaga Jan 18 '19 at 23:12
  • @juanpa.arrivillaga It doesn't work: i have a "key error : index" – boboo Jan 18 '19 at 23:13
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/186946/discussion-between-spenrose-and-juanpa-arrivillaga). – boboo Jan 18 '19 at 23:14

1 Answers1

1

You can use np.vectorize to do create a vectorized function that can take dataframe columns as parameters (or any other array-like types). See my example below (note that the arguments you pass to your vectorized function must all be the same length):

def foo(val1, val2, val3):
    """ do some stuff in here with your function parameters """
    return val1 * val2 * val3

# this will create a new column in your dataframe called 'new_col'
# each row in df.new_col will be the result of foo applied to that row
df['new_col'] = np.vectorize(foo)(df.col1, df.col2, df.col3)

Refer to the docs for np.vectorize.

kennyvh
  • 2,526
  • 1
  • 17
  • 26