1

I am trying to apply a function to each row in a dataframe. The problem is, the function requires output from the previous row as an input.

Wanting to use this function

def emaIrregular(alpha, sample, sampleprime, deltats, emaprime):
  a = deltats / float(alpha)
  u = math.exp(a * -1)
  v = (1 - u) / a

  return (u * emaprime) + ((v - u) * prevprime) +((1.0 - v) * sample)

The issue is from the parameter emaprime as this is computing the current ema value. I am aware I can shift the df to get sampleprime and deltats values.

The function I am using is slightly complex: here is a toy example I hope will help.

def myRollingSum(x, xprime):
  return x + xprime

So the similar to a rollingsum as it uses the output from the previous iteration as the input for the next.


Edit Ok, myRollingSum example is throwing people off. I need to access the result of the previous row, but this result is the thing being computed! i.e. f(x_i) = f(x_i-1) + c. Alternatively, similar to the way a factorial is commutated.

My data is sparse and irregularly spaced. It is not feasible to resample/interpolate and run over this expanded dataset for each window.

I have a feeling there is not an easy way to do this, apart from iterating over each record one by one?

  • 1
    pandas already implement some [exponentially weighted moving window functions](http://pandas.pydata.org/pandas-docs/stable/api.html#exponentially-weighted-moving-window-functions), if that is not what you need then perhaps [`.rolling_apply`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.rolling_apply.html) – behzad.nouri Nov 17 '15 at 01:08
  • Unfortunately the ewma function does not behave too well with my data. I have sparse and irregular time-series data. – Andrew Leeming Nov 17 '15 at 12:10

2 Answers2

0

It looks like .rolling_apply would definitely work as behzad.nouri suggested

Another stupider but possibly easier to follow way would be to use .shift(1) to make a shifted column. Then, use numpy function vectorize to call a function using the two columns as inputs.

df['shifted'] = df["x"].shift(1)
def myRollingSum(x, xprime):
  return x + xprime
df['rsum'] = np.vectorize(myRollingSum)(df['x'], df['shifted'])
Community
  • 1
  • 1
mirthbottle
  • 722
  • 7
  • 20
  • Ok, maybe I am bad at explaining this/use of rollingsum example threw people off the real problem (or maybe I'm stupid and not seeing it). f(x_t) = f(x_t-1) + c, would be the basic form I guess. Perhaps using factorial is a better example – Andrew Leeming Nov 17 '15 at 12:23
  • 1
    In .rolling_apply, you get access to the current data and previous data (the number that you set the window to) in the form of an array. You can have the function only use the t-1 data point. – mirthbottle Nov 17 '15 at 19:15
  • Oh I see, you actually want to use all the previous data points in a recursive function? – mirthbottle Nov 17 '15 at 19:19
  • Well it is recursive but each iteration is saved back to the series, meaning I only need to access the last value. How do I access the t-1 data point in rolling_apply? The docs don't seem to list an obvious parameter for this. – Andrew Leeming Nov 18 '15 at 14:03
  • You do this. pd.rolling_apply(yourseries,2, yourfunction) in your function, you pass a list, which is the [x_t-1, x_t], so you can reference x_t-1 as yourlist[0] – mirthbottle Nov 18 '15 at 23:56
0

It looks like you want to apply a recursive function. In that case, .rolling_apply won't work. One way would be to use the series values as a list or numpy array. Then loop through the list to use the recursive function.

Your function should be calling itself to look something like this.

def factorial(i, alist):
    if i > 0:
        print alist[i-1]
        return alist[i]*factorial(i-1,alist)
    else:
        return 1

If you want to do it through the dataframe, you can make a series that contains all the values of the series in a list. Then you make another one that has the index number. Then you can call the factorial function (or whatever you function is) using numpy.vectorize.

df["alldata"] = df["x"].values().tolist()
df = df.reset_index()
# 
df["fact"] = numpy.vectorize(factorial)(df["index"], df["alldata"])

I think this solution will execute faster than using iterrows(), but I'm not sure.

mirthbottle
  • 722
  • 7
  • 20