What is the most efficient way to generate a serie from a recurrence relation in Python using pandas and numpy?

Question

Using pandas and numpy, what is the most efficient way to do what the f1 function does?

import numpy as np
import pandas as pd
from time import time

n = 10000
df = pd.DataFrame()
df["a"] = np.random.randn(n)
df["b"] = np.random.uniform(n)


def f1(df):
    df.loc[0, "c"] = 100
    for i in range(1, len(df)):
        df.loc[i, "c"] = df.loc[i, "a"] * df.loc[i, "b"] +\
            (1 - df.loc[i, "a"]) * df.loc[i - 1, "c"]

start_time = time()
f1(df)
ellapsed_time = time() - start_time
print(ellapsed_time)

What do you want `f1` does?Do you really want `i` to be the index of `df`? — Shihe Zhang, Nov 09 '17 at 07:34
Instead of using `for` with `range`. `iteriterms()` is a good option. — Shihe Zhang, Nov 09 '17 at 07:36
Hello Shihe, no I do not really need `i` to be the index but how would you write f1 with `iteritems`? I had tried with `iterrows` but it had not been an improvement. — vwrobel, Nov 09 '17 at 07:47
Yes ags29, I will turn to Cython if there is no efficient solution with numpy/pandas :) — vwrobel, Nov 09 '17 at 08:07
iterrows converts rows to series and so should be slow. have you tried itertuples instead? Look at piRSquared's answer here: [Iterate Rows](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas). Also, you may want to try numba first before Cython to bring in the power of JIT. — skrubber, Nov 09 '17 at 08:37

score 1 · Accepted Answer · answered Nov 09 '17 at 18:57

Sometimes scipy.signal can solve such recurence, but I do not find a good solution here. The Numba workaround :

import numba
@numba.njit
def f1n(a,b):
    c=np.empty_like(a)
    c[0]=100
    for i in range(1,len(a)):
        c[i]=a[i]*b[i]+(1-a[i])*c[i-1]
    return c

Tests:

In [559]: %timeit f1n(df.a.values,df.b.values)
52.9 µs ± 1.24 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [560]: %timeit f1(df)
4.62 s ± 13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [563]: np.allclose(df.c,f1n(df.a.values,df.b.values))
Out[563]: True

90,000 x faster, and equally readable.

What is the most efficient way to generate a serie from a recurrence relation in Python using pandas and numpy?

1 Answers1