1

So I have a data frame of orders, with the order date as the index, which I set so:

df = df.set_index('ORDER_ENTRY_DATE', drop=False)

In the code below I create a new feature, containing the total amount successfully paid in the last 8 weeks for a specific customer. (excluding current order)

df["LAST_8_WEEKS_SUCCESSFUL"] = (df["PAYMENT_SUCCESSFUL"].mul(df["TOTAL_AMOUNT"])
                                                                .groupby(df["CUST_NO"])
                                                                .transform(lambda x: x.rolling(window='56D', min_periods= 1).sum().shift())
                                                                .fillna(0)
                                        )

I have tested this code on a smaller version of my dataset and it works fine, but when running it on the full fledged 28 million rows dataset, I get a memory error

MemoryError: Unable to allocate 220. MiB for an array with shape (28879273,) and data type int64

Is there any other way to accomplish this without needing 220 MiB RAM? Is my code way too inefficient?

yalexx
  • 25
  • 6
  • 1
    Does this answer your question? [How can I reduce the memory of a pandas DataFrame?](https://stackoverflow.com/questions/57531388/how-can-i-reduce-the-memory-of-a-pandas-dataframe) – DocZerø Apr 07 '22 at 10:55
  • Without more detail regarding your DF, the only thing I can advise is to pick the right datatype for each column, and use categories where possible. Use `memory_usage()` to measure the current size and the possible improvement after any modification. – DocZerø Apr 07 '22 at 10:58

0 Answers0