1

I have two Dataframes DF1 and DF2. My goal is to look up DF2 with DF1 columns as keys; and save the returns as outcomes in DF3. Can someone help me with getting DF3?

e.g.

DF1                         DF2
map test1   test2           No.     outcome
A   NaN     NaN             1       AA
B   NaN     5               2       BB
C   1       6               3       CC
D   2       7               4       DD
E   3       NaN             5       EE
F   4       NaN             6       FF
G   5       8               7       GG
H   6       9               8       HH
I   7       10              9       II
                            10      JJ
                            11      KK
                            12      LL
                            13      MM

DF3
map test1   test2   outcome1        outcome2
A   NaN     NaN     NaN             NaN
B   NaN     5       NaN             EE
C   1       6       AA              FF
D   2       7       BB              GG
E   3       NaN     CC              NaN
F   4       NaN     DD              NaN
G   5       8       EE              HH
H   6       9       FF              II
I   7       10      GG              JJ

I am currently using two join functions, but this is not what I need. It drops NaNs in DF1, and only returns the overlap of test1 and test2.

df3 = df1.merge(df2, how='inner', left_on='test1', right_on='No.')
df3 = df3.merge(df2, how='inner', left_on='test2', right_on='No.')

currently my code will return this:

DF3
map test1 test2 outcome1 outcome2 C 1 6 AA FF D 2 7 BB GG G 5 8 EE HH H 6 9 FF II I 7 10 GG JJ

2 Answers2

1

Do you like code golf?

pd.merge(pd.merge(df1, df2, how='left',left_on='test1',right_on='No.'),df2, how='left',left_on='test2',right_on='No.').drop(['No._x','No._y'], axis=1).rename(columns={'outcome_x':'outcome1','outcome_y':'outcome2'})

  map  test1  test2 outcome1 outcome2
0   A    NaN    NaN      NaN      NaN
1   B    NaN    5.0      NaN       EE
2   C    1.0    6.0       AA       FF
3   D    2.0    7.0       BB       GG
4   E    3.0    NaN       CC      NaN
5   F    4.0    NaN       DD      NaN
6   G    5.0    8.0       EE       HH
7   H    6.0    9.0       FF       II
8   I    7.0   10.0       GG       JJ
gold_cy
  • 13,648
  • 3
  • 23
  • 45
1

Map would be more efficient in this case

DF3 = DF1.copy()
DF3['outcome1'] = DF1['test1'].map(DF2.set_index('No.')['outcome'])
DF3['outcome2'] = DF1['test2'].map(DF2.set_index('No.')['outcome'])

    map test1   test2   outcome1    outcome2
0   A   NaN     NaN     NaN         NaN
1   B   NaN     5.0     NaN         EE
2   C   1.0     6.0     AA          FF
3   D   2.0     7.0     BB          GG
4   E   3.0     NaN     CC          NaN
5   F   4.0     NaN     DD          NaN
6   G   5.0     8.0     EE          HH
7   H   6.0     9.0     FF          II
8   I   7.0     10.0    GG          JJ
Vaishali
  • 37,545
  • 5
  • 58
  • 86