How can I speed up a multi-column loop?

Question

I have a ~8million-ish row data frame consisting of sales for 615 products across 16 stores each day for five years.

I need to make new column/s that consists of the sales shifted back from 1 to 7 days. I've decided to sort the data frame by date, product and location. The I concatenate item and location as its own column.

Using that column I loop through each unique item/location concatenation and make the shifted sales columns. This code is below:

import pandas as pd
#sort values by item, location, date
df = df.sort_values(['date', 'product', 'location'])
df['sort_values'] = df['product']+"_"+df['location']

df1 = pd.DataFrame()
z = 0
for i in list(df['sort_values'].unique()):
    df_ = df[df['sort_values']==i]
    df_ = df_.sort_values('ORD_DATE')
    df_['eaches_1'] = df_['eaches'].shift(-1)
    df_['eaches_2'] = df_['eaches'].shift(-2)
    df_['eaches_3'] = df_['eaches'].shift(-3)
    df_['eaches_4'] = df_['eaches'].shift(-4)
    df_['eaches_5'] = df_['eaches'].shift(-5)
    df_['eaches_6'] = df_['eaches'].shift(-6)
    df_['eaches_7'] = df_['eaches'].shift(-7)
    df1 = pd.concat((df1, df_))
    z+=1
    if z % 100 == 0:
        print(z)

The above code gets me exactly what I want, but takes FOREVER to complete. Is there a faster way to accomplish what I want?

[How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — wwii, Jun 09 '22 at 15:24
`for i in list(df['sort_values'].unique()): df_ = df[df['sort_values']==i]` - this looks like it could be accomplished with `.groupby('sort_values')`. How many unique df['sort_values'] are there? — wwii, Jun 09 '22 at 15:31
How long do the sorts take? Seems like you could sort once on `['sort_values','ORD_DATE']` just before iterating then you wouldn't need to do it every iteration - can't tell whether that would help but getting stuff out of a loop always seems to improve things. — wwii, Jun 09 '22 at 15:41
To extend on what @wwii mentions, you could profile all this to see where you are spending most time: https://docs.python.org/3/library/profile.html — jch, Jun 09 '22 at 17:16
There are 6,800 unique 'sort_values' in the data set. I thought of using a 'groupby' but the shifted columns get off when a product changes in the dataset. — Jordan, Jun 10 '22 at 10:14

How can I speed up a multi-column loop?

0 Answers0