This is a lot more complicated than at first glance. df1 is in long-form, it has two entries for 'b'. So first it needs to be stacked/unstacked/pivoted into a 3x3 table of 'Result' where 'Column' becomes the index, and the values from 'Item' = 'x'/'y'/'z' are expanded to a full 3x3 matrix with NaN for missing values:
>>> df1_full = df1.pivot(index='Column', columns='Item', values='Result')
Item x y z
Column
a 3.0 NaN NaN
b NaN 4.0 5.0
c 6.0 NaN NaN
(Note the unwanted type-conversion to float, this is because numpy doesn't have NaN for integers, see Issue 17013 in pre-pandas-0.22.0 versions. No problem, we'll just cast back to int at the end.)
Now we want to do df1_full.merge(df2, left_index=True, right_on=??)
But first we need another trick/intermediate column to find the leftmost valid value in df2
which corresponds to a valid column-name from df1
; the value n
is invalid, maybe we replace it with NaN
to make life easier:
>>> df2.replace('n', np.NaN)
a b c
0 x x x
1 NaN y z
2 NaN NaN NaN
>>> df2_nan.columns = [0,1,2]
0 1 2
0 x x x
1 NaN y z
2 NaN NaN NaN
And we want to successively test df2
's columns from L-to-R as to whether their value is in df1_full.columns
, similar to Computing the first non-missing value from each column in a DataFrame
, except testing successive columns (axis=1
). Then store that intermediate column-name into a new column, 'join_col' :
>>> df2['join_col'] = df2.replace('n', np.NaN).apply(pd.Series.first_valid_index, axis=1)
a b c join_col
0 x x x a
1 n y z b
2 n n n None
Actually we want to index into the column-names of df1, but it blows up on the NaN:
>>> df1.columns[ df2_nan.apply(pd.Series.first_valid_index, axis=1) ]
(Well that's not exactly working, but you get the idea.)
Finally we do the merge df1_full.merge(df2, left_index=True, right_on='join_col')
. And maybe take the desired column slice ['a','b','c','Result']
. And cast Result
back to int, or map 'Nan' -> 'null'.