0

Follow up to: How can I reference the key in the Pandas dataframes within that dictionary?

The goal is still to forecast the revenue by fiscal year where I will break revenue into a new column according to how much will be garnered in each year. I have code (put together with some help) that pulls several dataframes into a single dataframe using a dictionary in which I've put them, duplicated except for the Fiscal Year column. These dataframes were then concatenated into one.

I've simplified my code to the below:

import pandas as pd
columns = ['ID','Revenue','Fiscal Year']
ID = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Revenue = [1000, 1200, 1300, 100 ,500, 0, 800, 950, 4321, 800]
FY = []
d = {'ID': ID, 'Revenue': Revenue}
df = pd.DataFrame(d)
df['Fiscal Year'] = ''

def df_dict_func(start, end, dataframe):
    date_range = range(start, end + 1)
    dataframe_dict = {}
    for n in date_range:
        sub = dataframe.copy()
        sub['Fiscal Year'] = n
        dataframe_dict[n] = sub
    return dataframe_dict    

df_dict = df_dict_func(2019, 2035, df)
df = pd.concat(df_dict)

The code works excellently for smaller datasets, but when I go to expand it to a large dataset, I receive a MemoryError. Is there a more efficient way to duplicate the results of the code while avoiding the MemoryError issue?

The error that I am getting is specifically "MemoryError" and it occurs right before I receive any result from my pd.concat command. Each of the dataframes within the dictionary are substantial in size (over 500MB).

DJHeels
  • 89
  • 8
  • On what line do you get the MemoryError? Can you post the text of the error? – killian95 Oct 19 '18 at 16:40
  • There are some [tips for enhancing performance](https://pandas.pydata.org/pandas-docs/stable/enhancingperf.html) in the documentation. There is also [this question](https://stackoverflow.com/questions/47747979/how-to-reduce-the-memory-used-by-pandas-dataframe) and [this article](https://www.dataquest.io/blog/pandas-big-data/) with some helpful tips on reducing memory size. Overall though, if it won't fit in memory, it won't fit, and you may need to process it in chunks – G. Anderson Oct 19 '18 at 16:45
  • 1
    In addition to @G.Anderson's comment, have a look at [dask](http://docs.dask.org/en/latest/dataframe-api.html). It's dataframe api is similar to pandas but it can handle dataframes larger than memory. See the [dataframe](http://docs.dask.org/en/latest/dataframe.html) docs for more information. – Chris Oct 20 '18 at 07:22

0 Answers0