-1

I want to do some complex calculations in pandas while referencing previous values (basically I'm calculating row by row). However the loops take forever and I wanted to know if there was a faster way. Everybody keeps mentioning using shift but I don't understand how that would even work.

df = pd.DataFrame(index=range(500)
df["A"]= 2
df["B"]= 5
df["A"][0]= 1
for i in range(len(df):
    if i != 0: df['A'][i] = (df['A'][i-1] / 3) - df['B'][i-1] + 25
  • [`Series.shift`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.shift.html#pandas-series-shift) is used to access the previous row's values. If you gave an example of what you expect your output to be and how these calculations are supposed to work it would be helpful. – Henry Ecker Jun 20 '21 at 00:22
  • looks like you are using a recalcuated value in A, so not sure a shift can be done successfully here. Looping may be the only option. The other option is that your logic is incorrect and you don't need to use a recalculated value for column A. as noted in the first comment, please show expected output. – Jonathan Leon Jun 20 '21 at 00:56

1 Answers1

0
df = pd.DataFrame(index=range(5000))
df["A"]= 2
df["B"]= 5
df["A"][0]= 1
import numpy_ext as npe

# for i in range(len(df):
#     if i != 0: df['A'][i] = (df['A'][i-1] / 3) - df['B'][i-1] + 25

# SO example - function of previous values in A and B
def f(A,B):
    r = np.sum(A[:-1]/3) - np.sum(B[:-1] + 25) if len(A)>1 else A[0]
    return r

# much simpler example, sum of previous values
def g(A):
    return np.sum(A[:-1])

df["AB_combo"] = npe.expanding_apply(f, 1, df["A"].values, df["B"].values)
df["A_running"] = npe.expanding_apply(g, 1, df["A"].values)

print(df.head(10).to_markdown())

sample output

A B AB_combo A_running
0 1 5 1 0
1 2 5 -29.6667 1
2 2 5 -59 3
3 2 5 -88.3333 5
4 2 5 -117.667 7
5 2 5 -147 9
6 2 5 -176.333 11
7 2 5 -205.667 13
8 2 5 -235 15
9 2 5 -264.333 17
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30