Fill NaN's within 1 column of a df via lookup to another df via pandas

Question

I seen various versions of this question but none of them seem to fit with what I am attempting to do: here's my data:

Here's the df with the NaNs:

df = pd.DataFrame({"A": ["10023", "10040", np.nan, "12345", np.nan, np.nan, "10033", np.nan, np.nan],
               "B": [",", "17,-6", "19,-2", "17,-5", "37,-5", ",", "9,-10", "19,-2", "2,-5"],
               "C": ["small", "large", "large", "small", "small", "large", "small", "small", "large"]})

       A      B      C
0  10023      ,  small
1  10040  17,-6  large
2    NaN  19,-2  large
3  12345  17,-5  small
4    NaN  37,-5  small
5    NaN      ,  large
6  10033  9,-10  small
7    NaN  19,-2  small
8    NaN   2,-5  large

Next I have the lookup df called df2:

df2 = pd.DataFrame({"B": ['17,-5', '19,-2', '37,-5', '9,-10'],
                "A": ["10040", "54321", "12345", "10033"]})

       B      A
0  17,-5  10040
1  19,-2  54321
2  37,-5  12345
3  9,-10  10033

I would like to fill in the NaNs of column A on df by looking up column df2.B and returning df2.A such that the resulting dfr looks like this:

       A      B      C
0  10023      ,  small
1  10040  17,-6  large
2  54321  19,-2  large
3  10040  17,-5  small
4  12345  37,-5  small
5    NaN      ,  large
6  10033  9,-10  small
7  54321  19,-2  small
8    NaN   2,-5  large

Important caveats:

The dfs do not have matching indexes
The contents of df.A and df2.A are non-unique()
The rows of df2 do make up unique pairs.
Assume that there are more columns, not shown, with NaNs.

Using pandas, the rows of interest on df would be found (I think) via: df.loc[df['A'].isnull(),]. This answer seemed promising but I'm unclear where df1 in that example comes from. My actual data set is much, much larger than this and I'll have to be doing replacing several columns in this way.

score 1 · Accepted Answer · answered Mar 10 '19 at 23:50

Just using np.where

df.A=np.where(df.A.isnull(),df.B.map(df2.set_index('B').A),df.A)
df
Out[149]: 
       A      B      C
0  10023      ,  small
1  10040  17,-6  large
2  54321  19,-2  large
3  12345  17,-5  small
4  12345  37,-5  small
5    NaN      ,  large
6  10033  9,-10  small
7  54321  19,-2  small
8    NaN   2,-5  large

score 1 · Answer 2 · answered Mar 11 '19 at 00:39

1

The map method from Wen-Ben will be faster in terms of speed, but here's another way you can solve this problem, just for your convenience and knowledge

You can use pd.merge, because this is basically a join problem. After the merge, we fillna and drop the columns we dont need.

df_final = pd.merge(df, df2, on='B', how='left', suffixes=['_1','_2'])
df_final['A'] = df_final.A_1.fillna(df_final.A_2)
df_final.drop(['A_1', 'A_2'], axis=1, inplace=True)

print(df_final)
       B      C      A
0      ,  small  10023
1  17,-6  large  10040
2  19,-2  large  54321
3  17,-5  small  12345
4  37,-5  small  12345
5      ,  large    NaN
6  9,-10  small  10033
7  19,-2  small  54321
8   2,-5  large    NaN

answered Mar 11 '19 at 00:39

Erfan

40,971
8
66
78

thank you - I was working on a `merge` version, too but I also saw the `map` was speedy. Good lookin' out. – jmb277 Mar 11 '19 at 00:52
No problem, for any questions, please ask :) – Erfan Mar 11 '19 at 00:55
2

btw @jmb277 if you really want to understand `merging` in depth. This post is perfect for that: https://stackoverflow.com/questions/53645882/pandas-merging-101 – Erfan Mar 11 '19 at 00:58

Fill NaN's within 1 column of a df via lookup to another df via pandas

2 Answers2