I seen various versions of this question but none of them seem to fit with what I am attempting to do: here's my data:
Here's the df with the NaN
s:
df = pd.DataFrame({"A": ["10023", "10040", np.nan, "12345", np.nan, np.nan, "10033", np.nan, np.nan],
"B": [",", "17,-6", "19,-2", "17,-5", "37,-5", ",", "9,-10", "19,-2", "2,-5"],
"C": ["small", "large", "large", "small", "small", "large", "small", "small", "large"]})
A B C
0 10023 , small
1 10040 17,-6 large
2 NaN 19,-2 large
3 12345 17,-5 small
4 NaN 37,-5 small
5 NaN , large
6 10033 9,-10 small
7 NaN 19,-2 small
8 NaN 2,-5 large
Next I have the lookup df called df2
:
df2 = pd.DataFrame({"B": ['17,-5', '19,-2', '37,-5', '9,-10'],
"A": ["10040", "54321", "12345", "10033"]})
B A
0 17,-5 10040
1 19,-2 54321
2 37,-5 12345
3 9,-10 10033
I would like to fill in the NaN
s of column A
on df
by looking up column df2.B
and returning df2.A
such that the resulting dfr
looks like this:
A B C
0 10023 , small
1 10040 17,-6 large
2 54321 19,-2 large
3 10040 17,-5 small
4 12345 37,-5 small
5 NaN , large
6 10033 9,-10 small
7 54321 19,-2 small
8 NaN 2,-5 large
Important caveats:
- The
df
s do not have matching indexes - The contents of
df.A
anddf2.A
are non-unique() - The rows of
df2
do make up unique pairs. - Assume that there are more columns, not shown, with
NaN
s.
Using pandas, the rows of interest on df
would be found (I think) via: df.loc[df['A'].isnull(),]
. This answer seemed promising but I'm unclear where df1
in that example comes from. My actual data set is much, much larger than this and I'll have to be doing replacing several columns in this way.