Follow up to: How can I reference the key in the Pandas dataframes within that dictionary?
The goal is still to forecast the revenue by fiscal year where I will break revenue into a new column according to how much will be garnered in each year. I have code (put together with some help) that pulls several dataframes into a single dataframe using a dictionary in which I've put them, duplicated except for the Fiscal Year column. These dataframes were then concatenated into one.
I've simplified my code to the below:
import pandas as pd
columns = ['ID','Revenue','Fiscal Year']
ID = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Revenue = [1000, 1200, 1300, 100 ,500, 0, 800, 950, 4321, 800]
FY = []
d = {'ID': ID, 'Revenue': Revenue}
df = pd.DataFrame(d)
df['Fiscal Year'] = ''
def df_dict_func(start, end, dataframe):
date_range = range(start, end + 1)
dataframe_dict = {}
for n in date_range:
sub = dataframe.copy()
sub['Fiscal Year'] = n
dataframe_dict[n] = sub
return dataframe_dict
df_dict = df_dict_func(2019, 2035, df)
df = pd.concat(df_dict)
The code works excellently for smaller datasets, but when I go to expand it to a large dataset, I receive a MemoryError. Is there a more efficient way to duplicate the results of the code while avoiding the MemoryError issue?
The error that I am getting is specifically "MemoryError" and it occurs right before I receive any result from my pd.concat command. Each of the dataframes within the dictionary are substantial in size (over 500MB).