Here's the code in question:
import pandas as pd
df = pd.DataFrame(
[
list(range(200)),
list(range(200, 400))
],
index=['col_1', 'col_2']
).transpose()
col_1_index = df.columns.get_loc('col_1')
col_2_index = df.columns.get_loc('col_2')
target_1 = 2
for i in range(2, len(df)):
if (
df.iloc[i - 2, col_1_index] -
df.iloc[i - 1, col_2_index]
) > target_1:
col_2_value = (
df.iloc[i - 1, col_2_index] +
target_1
)
elif (
df.iloc[i - 1, col_2_index] -
df.iloc[i - 2, col_1_index]
) > target_1:
col_2_value = (
df.iloc[i - 1, col_2_index] -
target_1
)
else:
col_2_value = df.iloc[i - 2, col_1_index]
df.iloc[i, col_2_index] = col_2_value
df
'''
# expected output
col_1 col_2
0 0 200
1 1 201
2 2 199
3 3 197
4 4 195
... ... ...
195 195 193
196 196 194
197 197 195
198 198 196
199 199 197
'''
My issue is I can't use the common methods of speeding up the iteration such as df.itertuples()
or df.apply()
because I am referencing the previous row's calculated value.
The logic is iterating over the DataFrame
comparing the t-2
col_1 value with the t-1
col_2 value to decide what to assign to the t
col_2 value. So col_1 is static, while the col_2 time t
value is updated each iteration.