1

How can I replace values in a pandas dataframe with values from another dataframe based common columns.

I need to replace NaN values in dataframe1 based on the common columns of "types" and "o_period". any suggestion?

df1

types  c_years  o_periods  s_months  incidents
0       1        1          1     127.0        0.0
1       1        1          2      63.0        0.0
2       1        2          1    1095.0        3.0
3       1        2          2    1095.0        4.0
4       1        3          1    1512.0        6.0
5       1        3          2    3353.0       18.0
6       1        4          1       NaN        NaN
7       1        4          2    2244.0       11.0
8       2        1          1   44882.0       39.0
9       2        1          2   17176.0       29.0
10      2        2          1   28609.0       58.0
11      2        2          2   20370.0       53.0
12      2        3          1    7064.0       12.0
13      2        3          2   13099.0       44.0
14      2        4          1       NaN        NaN
15      2        4          2    7117.0       18.0
16      3        1          1    1179.0        1.0
17      3        1          2     552.0        1.0
18      3        2          1     781.0        0.0
19      3        2          2     676.0        1.0
20      3        3          1     783.0        6.0
21      3        3          2    1948.0        2.0
22      3        4          1       NaN        NaN
23      3        4          2     274.0        1.0
24      4        1          1     251.0        0.0
25      4        1          2     105.0        0.0
26      4        2          1     288.0        0.0
27      4        2          2     192.0        0.0
28      4        3          1     349.0        2.0
29      4        3          2    1208.0       11.0
30      4        4          1       NaN        NaN
31      4        4          2    2051.0        4.0
32      5        1          1      45.0        0.0
33      5        1          2       NaN        NaN
34      5        2          1     789.0        7.0
35      5        2          2     437.0        7.0
36      5        3          1    1157.0        5.0
37      5        3          2    2161.0       12.0
38      5        4          1       NaN        NaN
39      5        4          2     542.0        1.0

df2

 types  o_periods  s_months  incidents
0      1          1     911.0        3.0
1      1          2    1689.0        8.0
2      2          1   26852.0       36.0
3      2          2   14440.0       36.0
4      3          1     914.0        2.0
5      3          2     862.0        1.0
6      4          1     296.0        1.0
7      4          2     889.0        4.0
8      5          1     664.0        4.0
9      5          2    1047.0        7.0

df3:rows with NaN

    types  c_years  o_periods  s_months  incidents
6       1        4          1       NaN        NaN
14      2        4          1       NaN        NaN
22      3        4          1       NaN        NaN
30      4        4          1       NaN        NaN
33      5        1          2       NaN        NaN
38      5        4          1       NaN        NaN

I have tried to merge df2 with df3 but the indexing seems to reset.

Cedric
  • 79
  • 5
  • Please refrain from posting images. Instead, copy paste your data (e.g. output of `print(df)` directly into your question. Also, please post your expected output along with what you have tried. – Chris Aug 24 '21 at 03:52
  • It would help if you pasted *text values* for the sample data in df1 and df2 instead of screenshots. The text values could be used to answer your question. – Gui LeFlea Aug 24 '21 at 03:56

1 Answers1

1

First separate the rows where you have NaN values out into a new dataframe called df3 and drop the rows where there are NaN values from df1.

Then do a left join based on the new dataframe.

df4 = pd.merge(df3,df2,how='left',on=['types','o_period'])

After that is done, append the rows from df4 back into df1.

Another way is to combine the 2 columns you want to lookup into a single column

df1["types_o"] = df1["types_o"].astype(str) + df1["o_period"].astype(str)

df2["types_o"] = df2["types_o"].astype(str) + df2["o_period"].astype(str)

Then you can do a look up on the missing values.

df1.types_o.replace('Nan', np.NaN, inplace=True)

df1.loc[df1['s_months'].isnull(),'s_months'] = df2['types_o'].map(df1.types_o)

df1.loc[df1['incidents'].isnull(),'incidents'] = df2['types_o'].map(df1.types_o)

You didn't paste any code or examples of your data which is easily reproducible so this is the best I can do.

anarchy
  • 3,709
  • 2
  • 16
  • 48
  • Hi, I extracted the rows with NaN values in a df called df3. However when I try to merge df3 and df2, the rows index resetted. – Cedric Aug 24 '21 at 09:17
  • If doesn’t matter right? Because you’re merging on different columns – anarchy Aug 24 '21 at 09:24
  • Does the order of the data matter to you? If it does you can create a hard coded index column – anarchy Aug 24 '21 at 20:04
  • yup, I hard coded the index and appended the dataframe back. Thanks for your help though!! I used your method to left join the 2 dataframes – Cedric Aug 25 '21 at 10:38
  • If my answer helped please mark it correct, thank you! – anarchy Aug 25 '21 at 10:39