So I have a data frame of orders, with the order date as the index, which I set so:
df = df.set_index('ORDER_ENTRY_DATE', drop=False)
In the code below I create a new feature, containing the total amount successfully paid in the last 8 weeks for a specific customer. (excluding current order)
df["LAST_8_WEEKS_SUCCESSFUL"] = (df["PAYMENT_SUCCESSFUL"].mul(df["TOTAL_AMOUNT"])
.groupby(df["CUST_NO"])
.transform(lambda x: x.rolling(window='56D', min_periods= 1).sum().shift())
.fillna(0)
)
I have tested this code on a smaller version of my dataset and it works fine, but when running it on the full fledged 28 million rows dataset, I get a memory error
MemoryError: Unable to allocate 220. MiB for an array with shape (28879273,) and data type int64
Is there any other way to accomplish this without needing 220 MiB RAM? Is my code way too inefficient?