3

I'd like to combine two dataframes using their similar column 'A':

>>> df1
    A   B
0   I   1
1   I   2
2   II  3

>>> df2
    A   C
0   I   4
1   II  5
2   III 6

To do so I tried using:

merged = pd.merge(df1, df2, on='A', how='outer')

Which returned:

>>> merged
    A   B   C
0   I   1.0 4
1   I   2.0 4
2   II  3.0 5
3   III NaN 6

However, since df2 only contained one value for A == 'I', I do not want this value to be duplicated in the merged dataframe. Instead I would like the following output:

>>> merged
    A   B   C
0   I   1.0 4
1   I   2.0 NaN
2   II  3.0 5
3   III NaN 6

What is the best way to do this? I am new to python and still slightly confused with all the join/merge/concatenate/append operations.

Martijn
  • 33
  • 1
  • 4
  • How do you want it to know that the 4 value in C belonged with the 1 in B? – flyingmeatball Nov 22 '17 at 16:14
  • @flyingmeatball, I actually do not care about the relationship between columns B and C. I just want 4 to be inserted in the first row of A == 'I' where column C holds no value yet. – Martijn Nov 22 '17 at 16:33

1 Answers1

8

Let us create a new variable g, by cumcount

df1['g']=df1.groupby('A').cumcount()
df2['g']=df2.groupby('A').cumcount()
df1.merge(df2,how='outer').drop('g',1)
Out[62]: 
     A    B    C
0    I  1.0  4.0
1    I  2.0  NaN
2   II  3.0  5.0
3  III  NaN  6.0
BENY
  • 317,841
  • 20
  • 164
  • 234