I have a dataframe that contains some missing values where the 'new_id' is an empty string. I have another dataframe that contains the values it should have, but does not have all the columns in the original dataframe so I can't just replace those rows by index. My original dataframe looks like:
df = pd.DataFrame({
"id": ["1", "2", "3", "4", "5"],
"new_id": ["", "", "23", "", "52"],
"color": ["blue", "red", "green", "yellow", "green"],
"age": [23, 11, 17, 13, 51],
"trade": ["", "", "C", "", "B"],
"color2": ["red", "yellow", "red", "blue", "purple"],
"fruit": ["", "", "orange", "", "grape"]
})
id new_id color age trade color2 fruit
1 blue 23 red
2 red 11 yellow
3 23 green 17 C red orange
4 yellow 13 blue
5 52 green 51 B purple grape
The table with the data I need is:
df_map = pd.DataFrame({
"id": ["1", "2", "4"],
"new_id": ["", "", ""],
"trade": ["B", "C", "A"],
"fruit": ["apple", "orange", "apple"]
})
id new_id trade fruit
1 B apple
2 C orange
4 A apple
Desired output:
id new_id color age trade color2 fruit
1 blue 23 B red apple
2 red 11 C yellow orange
3 23 green 17 C red orange
4 yellow 13 A blue apple
5 52 green 51 B purple grape
How can I combine the information in the two data frames to get my full dataset, and only replace values where 'new_id' is an empty string.