0

The following python code gives me 3 dataframes (df_apples, df_oranges, df_grapes) showing sales and price for various fruits by month. I created a list of these dfs (df_list). I have another frame (df_forecast) which I want to append to each of the frames in df_list so I can create customized projections of each fruit type. However, when I try to append it doesn't work:

import pandas as pd
import numpy as np

# HISTORY DATAFRAMES
#####################################################################################
df_apples = pd.DataFrame({'sales': [400, 450, 500, 545, 550], 
     'price': [3.00, 2.75, 3.44, 4.00, 5.32], 
     'date' : ['2022-10-31','2022-11-30','2022-12-31','2023-01-31','2023-02-28']})

df_oranges = pd.DataFrame({'sales': [50, 65, 60, 80, 110], 
     'price': [0.50, 0.45, 0.30, 0.35, 0.40], 
     'date' : ['2022-10-31','2022-11-30','2022-12-31','2023-01-31','2023-02-28']})

df_grapes = pd.DataFrame({'sales': [300, 350, 360, 380, 510], 
     'price': [1.05, 1.10, 1.35, 1.55, 0.95], 
     'date' : ['2022-10-31','2022-11-30','2022-12-31','2023-01-31','2023-02-28']})

df_list=[df_apples,df_oranges,df_grapes]


# FORECAST PERIOD DATAFRAME
####################################################################################
index = pd.date_range('2023-03-31', periods=6, freq='M')
columns = ['sales','price']

df_forecast = pd.DataFrame(index=index, columns=columns)

# HISTORY + FORECAST FRAMES TOGETHER
#####################################

for x in df_list:
    x.set_index(pd.to_datetime(x['date']), inplace=True)   # convert date from object to datetime
    x.drop('date', axis=1, inplace=True)    
    x = x.append(df_forecast)

It's like df_forecast is not appending at all...showing df_apples as an example:

enter image description here

When in fact I want this:

enter image description here

What's wrong?

jack homareau
  • 319
  • 1
  • 8

3 Answers3

0
import pandas as pd
import numpy as np

# HISTORY DATAFRAMES
#####################################################################################
df_apples = pd.DataFrame({'sales': [400, 450, 500, 545, 550], 
     'price': [3.00, 2.75, 3.44, 4.00, 5.32], 
     'date' : ['2022-10-31','2022-11-30','2022-12-31','2023-01-31','2023-02-28']})

df_oranges = pd.DataFrame({'sales': [50, 65, 60, 80, 110], 
     'price': [0.50, 0.45, 0.30, 0.35, 0.40], 
     'date' : ['2022-10-31','2022-11-30','2022-12-31','2023-01-31','2023-02-28']})

df_grapes = pd.DataFrame({'sales': [300, 350, 360, 380, 510], 
     'price': [1.05, 1.10, 1.35, 1.55, 0.95], 
     'date' : ['2022-10-31','2022-11-30','2022-12-31','2023-01-31','2023-02-28']})

df_list=[df_apples,df_oranges,df_grapes]

# FORECAST PERIOD DATAFRAME
####################################################################################
index = pd.date_range('2023-03-31', periods=6, freq='M')
columns = ['sales','price']

df_forecast = pd.DataFrame(index=index, columns=columns)

# HISTORY + FORECAST FRAMES TOGETHER
#####################################

for x in range(len(df_list)):
    df_list[x].set_index(pd.to_datetime(df_list[x]['date']), inplace=True)   # convert date from object to datetime
    df_list[x].drop('date', axis=1, inplace=True)    
    df_list[x] = df_list[x].append(df_forecast)

df_list[0] is a copy of df_apples and so on

dflist[0] will contain your apples df. df_apples is unaffected

dflist[1] oranges and so on

geekay
  • 340
  • 1
  • 5
0

its because of this:

Can't modify list elements in a loop

use a index based for loop instead

for idx, x in enumerate(df_list):
    x.set_index(pd.to_datetime(x['date']), inplace=True)   # convert date from object to datetime
    x.drop('date', axis=1, inplace=True)    
    x = x.append(df_forecast)
    df_list[idx] = x
add-IV
  • 66
  • 5
0
import pandas as pd
import numpy as np

# HISTORY DATAFRAMES
#####################################################################################
df_apples = pd.DataFrame({'sales': [400, 450, 500, 545, 550], 
     'price': [3.00, 2.75, 3.44, 4.00, 5.32], 
     'date' : ['2022-10-31','2022-11-30','2022-12-31','2023-01-31','2023-02-28']})

df_oranges = pd.DataFrame({'sales': [50, 65, 60, 80, 110], 
     'price': [0.50, 0.45, 0.30, 0.35, 0.40], 
     'date' : ['2022-10-31','2022-11-30','2022-12-31','2023-01-31','2023-02-28']})

df_grapes = pd.DataFrame({'sales': [300, 350, 360, 380, 510], 
     'price': [1.05, 1.10, 1.35, 1.55, 0.95], 
     'date' : ['2022-10-31','2022-11-30','2022-12-31','2023-01-31','2023-02-28']})

df_dict = {'df_apples': df_apples, 'df_oranges': df_oranges, 'df_grapes': df_grapes}

# FORECAST PERIOD DATAFRAME
####################################################################################
index = pd.date_range('2023-03-31', periods=6, freq='M')
columns = ['sales','price']

df_forecast = pd.DataFrame(index=index, columns=columns)

for i, (name,df) in enumerate(df_dict.items()):
    df.set_index(pd.to_datetime(df['date']), inplace=True)   # convert date from object to datetime
    df.drop('date', axis=1, inplace=True)    
    df_dict[name] = df.append(df_forecast)

# This is hopefully what is needed:
df_dict['df_apples']

The elements of df_list no longer point to the original data frames, they are actually new objects (ref: https://realpython.com/pointers-in-python/). This can be verified by using the id() function which returns the memory address of an object:

id(df_apples)
id(df_list[0])
# Note that these refer to two different addresses in memory.

So to achieve the desired outcome the for loop can be re-written as above (similarly to answers by add-IV and geekay). Instead of df_list, a dictionary is used (df_dict) to enable referencing to the original df table names. Note that the original df_apple won't be changed, however df_dict['df_apples'] will.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
sarksi
  • 1
  • 1
  • Thanks...that looks good! Can we add a line in the loop so that each of the 3 history frames will have the df_forecast portion added...so when I type df_apple (for instance) I get the history plus the appended forecast? – jack homareau Feb 02 '23 at 20:41
  • The `df_apples` object won't be modified unless you explicitly edit it (using its variable name). You could add something like `if(name) == 'df_apples': df_apples = df_apples.append(df_forecast)`, and repeat the same for the other two df's, but that would defeat the purpose of having a for loop. – sarksi Feb 02 '23 at 21:33