Optimization of the loop

Question

My data is: df1 - prepared df with logs

In[1]: import pandas as pd
In[2]: df1 = pd.DataFrame([[1, 'confirmed', 01/01/2017 14:05:00], [1, 'picked', 01/01/2017 14:10:00]], columns = ['ID', 'log', 'time'])
In[3]: print(df1)

I'm iterating over it to find 'picked' in log and take related time and then I'm iterating over each log which exactly one before the row with 'picked'.

df2 - new empty df with the same index as df1

I have a loop that looks like this:

for row in df1.index:
    if df1['log'][row] == 'picked':
        df2['time1'][row] = df1['time'][row]
        if df1['ID'][row] == df1['ID'][row-1]:
            df2['time2'][row] = df1['time'][row-1]

It fills 'time1' and 'time2' column in a new df so I will be able to take time range between them. It is the time of being in the queue.

The loop works fine in the matter of the output but it lasts for ages (df1 has 700 000 rows and more than half of them has 'picked' in 'log' column)

I will be very grateful for any suggestions related to optimization of the looping time and shape of the loop.

Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. — jezrael, Mar 27 '18 at 10:21
I agree with @jezrael. Essentially we need a *small* example (say 5 rows in each dataframe) with inputs & desired output. — jpp, Mar 27 '18 at 10:26

score 0 · Answer 1 · answered Mar 27 '18 at 15:13

If df1 and df2 have the same indices then you might not have to use a for loop. To start, try this:

# this will replace the values in df2['time1'] with the values of df1['time'] where df1['log'] == 'picked'
df2.loc[df1['log'] == 'picked', 'time1'] = df1.loc[df1['log'] == 'picked', 'time']

Also, it would help if you could supply some sample data as the comments above have suggested.

Optimization of the loop

1 Answers1