Concatenating Duplicated Large Dataframes: MemoryError

Question

Follow up to: How can I reference the key in the Pandas dataframes within that dictionary?

The goal is still to forecast the revenue by fiscal year where I will break revenue into a new column according to how much will be garnered in each year. I have code (put together with some help) that pulls several dataframes into a single dataframe using a dictionary in which I've put them, duplicated except for the Fiscal Year column. These dataframes were then concatenated into one.

I've simplified my code to the below:

import pandas as pd
columns = ['ID','Revenue','Fiscal Year']
ID = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Revenue = [1000, 1200, 1300, 100 ,500, 0, 800, 950, 4321, 800]
FY = []
d = {'ID': ID, 'Revenue': Revenue}
df = pd.DataFrame(d)
df['Fiscal Year'] = ''

def df_dict_func(start, end, dataframe):
    date_range = range(start, end + 1)
    dataframe_dict = {}
    for n in date_range:
        sub = dataframe.copy()
        sub['Fiscal Year'] = n
        dataframe_dict[n] = sub
    return dataframe_dict    

df_dict = df_dict_func(2019, 2035, df)
df = pd.concat(df_dict)

The code works excellently for smaller datasets, but when I go to expand it to a large dataset, I receive a MemoryError. Is there a more efficient way to duplicate the results of the code while avoiding the MemoryError issue?

The error that I am getting is specifically "MemoryError" and it occurs right before I receive any result from my pd.concat command. Each of the dataframes within the dictionary are substantial in size (over 500MB).

On what line do you get the MemoryError? Can you post the text of the error? — killian95, Oct 19 '18 at 16:40
There are some [tips for enhancing performance](https://pandas.pydata.org/pandas-docs/stable/enhancingperf.html) in the documentation. There is also [this question](https://stackoverflow.com/questions/47747979/how-to-reduce-the-memory-used-by-pandas-dataframe) and [this article](https://www.dataquest.io/blog/pandas-big-data/) with some helpful tips on reducing memory size. Overall though, if it won't fit in memory, it won't fit, and you may need to process it in chunks — G. Anderson, Oct 19 '18 at 16:45
In addition to @G.Anderson's comment, have a look at [dask](http://docs.dask.org/en/latest/dataframe-api.html). It's dataframe api is similar to pandas but it can handle dataframes larger than memory. See the [dataframe](http://docs.dask.org/en/latest/dataframe.html) docs for more information. — Chris, Oct 20 '18 at 07:22

Concatenating Duplicated Large Dataframes: MemoryError

0 Answers0