Please see code below.
Overview
I have a dataframe 'df_master' that contains four columns: Date; the name of a specific df from which data should be pulled; measurement on specified Date; and measurement one year after specified Date.
What I am trying to do is write code so that, for each row in df_master, the code pulls measurement data into df_master from one of a large number of other dfs. Each row will have a different df from which it is pulling measurement data. I want to populate two new columns in df_master. One would be the measurement data pulled from the specified df for the date listed in that row in df_master, and the other would be measurement data on the date that is one year in the future.
Example
For instance, consider the code below. The first row in df_master is '2016-01-01' and 'df_B'. This means that the first row of df_master should populate with a value of 250 for 'Measurement_Today' and 265 for 'Measurement_One_Year_in_Future'.
The next row of df_master would need to pull the Measurement data from df_C, etc. In other words, each row of df_master would specify the df from which the Measurement data would come from, and that df differs by row.
Assume that there are thousands of dfs from which df_master must pull data, and combining all of those thousands of dfs into one df is not possible.
I have tried to write a for loop that would pull this data in without success. I also tried to use .iloc to pull in the data for one year in the future, but that is not working, either.
Any help you can provide would be greatly appreciated. Thank you.
Code
# Import dependencies
import pandas as pd
import numpy as np
# Create 'df_A', 'df_B', 'df_C' and 'df_D', which contain measurement data on specific dates.
df_A = pd.DataFrame(np.array([['2016-01-01', 150], ['2017-01-01', 145],
['2018-01-01', 163], ['2019-01-01', 170],
['2020-01-01', 198], ['2021-01-01', 189],]),
columns=['Date', 'Measurement'])
df_A['Date'] = pd.to_datetime(df_A['Date'])
df_B = pd.DataFrame(np.array([['2016-01-01', 250], ['2017-01-01', 265],
['2018-01-01', 221], ['2019-01-01', 285],
['2020-01-01', 298], ['2021-01-01', 289],]),
columns=['Date', 'Measurement'])
df_B['Date'] = pd.to_datetime(df_B['Date'])
df_C = pd.DataFrame(np.array([['2016-01-01', 350], ['2017-01-01', 367],
['2018-01-01', 392], ['2019-01-01', 370],
['2020-01-01', 398], ['2021-01-01', 389],]),
columns=['Date', 'Measurement'])
df_C['Date'] = pd.to_datetime(df_C['Date'])
df_D = pd.DataFrame(np.array([['2016-01-01', 450], ['2017-01-01', 454],
['2018-01-01', 413], ['2019-01-01', 480],
['2020-01-01', 498], ['2021-01-01', 489],]),
columns=['Date', 'Measurement'])
df_D['Date'] = pd.to_datetime(df_D['Date'])
# Create df_master
df_master = pd.DataFrame(np.array([['2016-01-01', 'df_B','','' ], ['2017-01-01', 'df_C','','' ],
['2018-01-01', 'df_B','','' ], ['2019-01-01', 'df_A','','' ],
['2018-01-01', 'df_A','','' ], ['2019-01-01', 'df_D','','' ],]),
columns=['Date', 'df_to_pull_measurement_from', 'Measurement_Today',
'Measurement_Next_Year'])
df_master['Date'] = pd.to_datetime(df_master['Date'])
# Create list of dfs from df_master['df_to_pull_measurement_from'].
list_of_dfs = df_master['df_to_pull_measurement_from']
### THIS DOES NOT WORK ###
# Add columns to df_master for measurement on given date and measurement one year into future.
for row in list_of_dfs:
selected_df = row + '[\'Measurement\']'
df_master['Measurement_Today'] = selected_df
df_master['Measurement_Next_Year'] = selected_df.iloc[idx +1]