I have a ~8million-ish row data frame consisting of sales for 615 products across 16 stores each day for five years.
I need to make new column/s that consists of the sales shifted back from 1 to 7 days. I've decided to sort the data frame by date, product and location. The I concatenate item and location as its own column.
Using that column I loop through each unique item/location concatenation and make the shifted sales columns. This code is below:
import pandas as pd
#sort values by item, location, date
df = df.sort_values(['date', 'product', 'location'])
df['sort_values'] = df['product']+"_"+df['location']
df1 = pd.DataFrame()
z = 0
for i in list(df['sort_values'].unique()):
df_ = df[df['sort_values']==i]
df_ = df_.sort_values('ORD_DATE')
df_['eaches_1'] = df_['eaches'].shift(-1)
df_['eaches_2'] = df_['eaches'].shift(-2)
df_['eaches_3'] = df_['eaches'].shift(-3)
df_['eaches_4'] = df_['eaches'].shift(-4)
df_['eaches_5'] = df_['eaches'].shift(-5)
df_['eaches_6'] = df_['eaches'].shift(-6)
df_['eaches_7'] = df_['eaches'].shift(-7)
df1 = pd.concat((df1, df_))
z+=1
if z % 100 == 0:
print(z)
The above code gets me exactly what I want, but takes FOREVER to complete. Is there a faster way to accomplish what I want?