Is there a way to reference a previous value in Pandas column efficiently?

Question

I want to do some complex calculations in pandas while referencing previous values (basically I'm calculating row by row). However the loops take forever and I wanted to know if there was a faster way. Everybody keeps mentioning using shift but I don't understand how that would even work.

df = pd.DataFrame(index=range(500)
df["A"]= 2
df["B"]= 5
df["A"][0]= 1
for i in range(len(df):
    if i != 0: df['A'][i] = (df['A'][i-1] / 3) - df['B'][i-1] + 25

[`Series.shift`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.shift.html#pandas-series-shift) is used to access the previous row's values. If you gave an example of what you expect your output to be and how these calculations are supposed to work it would be helpful. — Henry Ecker, Jun 20 '21 at 00:22
looks like you are using a recalcuated value in A, so not sure a shift can be done successfully here. Looping may be the only option. The other option is that your logic is incorrect and you don't need to use a recalculated value for column A. as noted in the first comment, please show expected output. — Jonathan Leon, Jun 20 '21 at 00:56

score 0 · Answer 1 · answered Jun 20 '21 at 10:17

numpy_ext can be used for expanding calculations
pandas-rolling-apply-using-multiple-columns for reference
I have also included a simpler calc to demonstrate behaviour in simpler way

df = pd.DataFrame(index=range(5000))
df["A"]= 2
df["B"]= 5
df["A"][0]= 1
import numpy_ext as npe

# for i in range(len(df):
#     if i != 0: df['A'][i] = (df['A'][i-1] / 3) - df['B'][i-1] + 25

# SO example - function of previous values in A and B
def f(A,B):
    r = np.sum(A[:-1]/3) - np.sum(B[:-1] + 25) if len(A)>1 else A[0]
    return r

# much simpler example, sum of previous values
def g(A):
    return np.sum(A[:-1])

df["AB_combo"] = npe.expanding_apply(f, 1, df["A"].values, df["B"].values)
df["A_running"] = npe.expanding_apply(g, 1, df["A"].values)

print(df.head(10).to_markdown())

sample output

	A	B	AB_combo	A_running
0	1	5	1	0
1	2	5	-29.6667	1
2	2	5	-59	3
3	2	5	-88.3333	5
4	2	5	-117.667	7
5	2	5	-147	9
6	2	5	-176.333	11
7	2	5	-205.667	13
8	2	5	-235	15
9	2	5	-264.333	17

Is there a way to reference a previous value in Pandas column efficiently?

1 Answers1

sample output