11

I would like to use the pandas.rolling_apply function to apply my own custom function on a rolling window basis.

but my function requires two arguments, and also has two outputs. Is this possible?

Below is a minimum reproducible example...

import pandas as pd
import numpy as np
import random
tmp  = pd.DataFrame(np.random.randn(2000,2)/10000, 
                    index=pd.date_range('2001-01-01',periods=2000),
                    columns=['A','B'])

def gm(df,p):
    v =(((df+1).cumprod())-1)*p
    return v.iloc[-1]

# an example output when subsetting for just 2001
gm(tmp['2001'],5)


# the aim is to do it on a rolling basis over a 50 day window
# whilst also getting both outputs and also allows me to add in the parameter p=5
# or any other number I want p to be... 
pd.rolling_apply(tmp,50,gm)

which leads to an error...since gm takes two arguments...

any help would be greatly appreciated...

EDIT

Following Jeff's comment I have progressed, but am still struggling with two or more column outputs, so if instead i make a new function (below) which just returns two random numbers (unconnected to the previous calculation) instead rather than the last rows of v, I get an error of TypeError: only length-1 arrays can be converted to Python scalars. This function works if

def gm2(df,p):
    df = pd.DataFrame(df)
    v =(((df+1).cumprod())-1)*p
    return np.random.rand(2)

pd.rolling_apply(tmp,50,lambda x: gm2(x,5)).tail(20)

This function works if 2 is changed to 1...

h.l.m
  • 13,015
  • 22
  • 82
  • 169

1 Answers1

15

rolling_apply passes numpy arrays to the applied function (at-the-moment), by 0.14 it should pass a frame. The issue is here

So redefine your function to work on a numpy array. (You can of course construct a DataFrame inside here, but your index/column names won't be the same).

In [9]: def gm(df,p):
   ...:     v = ((np.cumprod(df+1))-1)*p
   ...:     return v[-1]
   ...: 

If you wanted to use more of pandas functions in your custom function, do this (note that the indicies of the calling frame are not passed ATM).

def gm(arr,p):
    df = DataFrame(arr)
    v =(((df+1).cumprod())-1)*p
    return v.iloc[-1]

Pass it thru a lambda

In [11]: pd.rolling_apply(tmp,50,lambda x: gm(x,5)).tail(20)
Out[11]: 
                   A         B
2006-06-04  0.004207 -0.002112
2006-06-05  0.003880 -0.001598
2006-06-06  0.003809 -0.002228
2006-06-07  0.002840 -0.003938
2006-06-08  0.002855 -0.004921
2006-06-09  0.002450 -0.004614
2006-06-10  0.001809 -0.004409
2006-06-11  0.001445 -0.005959
2006-06-12  0.001297 -0.006831
2006-06-13  0.000869 -0.007878
2006-06-14  0.000359 -0.008102
2006-06-15 -0.000885 -0.007996
2006-06-16 -0.001838 -0.008230
2006-06-17 -0.003036 -0.008658
2006-06-18 -0.002280 -0.008552
2006-06-19 -0.001398 -0.007831
2006-06-20 -0.000648 -0.007828
2006-06-21 -0.000799 -0.007616
2006-06-22 -0.001096 -0.006740
2006-06-23 -0.001160 -0.006004

[20 rows x 2 columns]
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • how do you "redefine your function to work on a numpy array."? – h.l.m Jan 09 '14 at 17:17
  • 1
    you can only use numpy functions (and not pandas functions); or you can do ``DataFrame(df)`` to make it a frame – Jeff Jan 09 '14 at 17:38
  • Does this mean that in the custom function i can only run numpy functions and no pandas functions? – h.l.m Jan 09 '14 at 17:59
  • as I said, you *could* wrap the passed numpy array in a DataFrame if you want, then use pandas function, BUT, it will have only basic indexes; I'll change the answer to illustrate – Jeff Jan 09 '14 at 18:07
  • Is the lambda function parsing each column separately? – h.l.m Jan 09 '14 at 20:34
  • its getting an array that has those limit; put a print statement in an d look at it – Jeff Jan 09 '14 at 20:41
  • If i just have `def gm3(df,p): print df` and execute `pd.rolling_apply(tmp,50,lambda x: gm3(x,5))`, only the first 50 rows of column A is printed....is there a way to get both columns at the same time to say calculated an customised rolling correlation/regression? – h.l.m Jan 09 '14 at 20:55
  • @Jeff I believe that `rolling_apply` (and as of 0.18, `rolling`) still return numpy arrays rather than pandas dataframes. Did I miss something? – IanS May 24 '16 at 08:27
  • 2
    that's still an open issue pull requests to fix are welcome – Jeff May 24 '16 at 10:47
  • `rolling_apply(..)` have been moved to [`Rolling.apply`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.window.Rolling.apply.html#pandas-core-window-rolling-apply). – gies0r Apr 27 '20 at 21:17