I need to iterate over a DataFrame indexed by UNIX timestamp, and in one column, assign a value from another column in a different row at a specific index time in the future. This is what I'm currently doing:
df = pd.DataFrame([
[1523937600, 100.0, 0.0],
[1523937660, 120.0, 0.0],
[1523937720, 110.0, 0.0],
[1523937780, 90.0, 0.0],
[1523937840, 99.0, 0.0]],
columns=['time', 'value', 'target'])
df.set_index('time', inplace=True)
skip = 2 # mins skip-ahead
for i in range(0, df.shape[0]-1):
t = df.index[i] + (60*skip)
try:
df.iloc[i].target = df.loc[t].value
except KeyError:
df.iloc[i].target = 0.0
Output:
value target
time
1523937600 100.0 110.0
1523937660 120.0 90.0
1523937720 110.0 99.0
1523937780 90.0 0.0
1523937840 99.0 0.0
This works, but I am dealing with datasets containing millions of rows and it takes an extremely long time. Is there a more optimal way to do this?
EDIT: Added example input/output. Note, it is important that I obtain the value from the row with the calculated index time rather than just look ahead n rows, as there could be gaps in the times, or additional times in between.