I have 3 different DataFrames (1 master DataFrame and 2 additional DataFrames). I am trying to add a column to my master DataFrame, with the elements of the column being different cell values in the other two DataFrames. I am using two columns of the master DataFrame to figure out which of the 2 DataFrames I need to get data from, and two more columns to act as indexes to a particular cell in the selected DataFrame.
master_df = pd.DataFrame({
'col1': ['M', 'F', 'F', 'M'],
'col2': [0, 1, 2, 3],
'col3': ['X', 'Z', 'Z', 'X'],
'col4': [2021, 2022, 2023, 2024]
})
df1 = pd.DataFrame({
2021: [.632, .214, .987, .555],
2022: [.602, .232, .287, .552],
2023: [.932, .209, .347, .725],
2024: [.123, .234, .9873, .5005]
})
df2 = pd.DataFrame({
2021: [.6123, .2214, .4987, .555],
2022: [.6702, .232, .2897, .552],
2023: [.9372, .2, .37, .725],
2024: [.23, .24, .873, .005]
})
For each row of the master_df
, if the col1
value is 'M'
and the col3
value is 'X'
, I want to choose df1
. If the col1
value is 'F'
and the col3
value is 'Z'
, I want to choose df2
. Once I have selected the appropriate DataFrame
, I want to use col2
of the master_df
as a row index and col4
of the master_df
as a column index. Finally, I will get the selected cell value and put it into the new column to be added to the master_df
.
In this example, master_df
should look like this at the end:
master_df = pd.DataFrame({
'col1': ['M', 'F', 'F', 'M'],
'col2': [0, 1, 2, 3],
'col3': ['X', 'Z', 'Z', 'X'],
'col4': [2021, 2022, 2023, 2024],
'col5': [.632, .232, .37, .5005]
})
I have tried using a for loop to iterate through the master_df
, but it is extremely slow since the DataFrame
s that I'm working with have millions of rows each. Any efficient pandas solutions for this?