My data is: df1 - prepared df with logs
In[1]: import pandas as pd
In[2]: df1 = pd.DataFrame([[1, 'confirmed', 01/01/2017 14:05:00], [1, 'picked', 01/01/2017 14:10:00]], columns = ['ID', 'log', 'time'])
In[3]: print(df1)
I'm iterating over it to find 'picked' in log and take related time and then I'm iterating over each log which exactly one before the row with 'picked'.
df2 - new empty df with the same index as df1
I have a loop that looks like this:
for row in df1.index:
if df1['log'][row] == 'picked':
df2['time1'][row] = df1['time'][row]
if df1['ID'][row] == df1['ID'][row-1]:
df2['time2'][row] = df1['time'][row-1]
It fills 'time1' and 'time2' column in a new df so I will be able to take time range between them. It is the time of being in the queue.
The loop works fine in the matter of the output but it lasts for ages (df1 has 700 000 rows and more than half of them has 'picked' in 'log' column)
I will be very grateful for any suggestions related to optimization of the looping time and shape of the loop.