Merge dataframes without duplicating rows in python pandas

Question

I'd like to combine two dataframes using their similar column 'A':

>>> df1
    A   B
0   I   1
1   I   2
2   II  3

>>> df2
    A   C
0   I   4
1   II  5
2   III 6

To do so I tried using:

merged = pd.merge(df1, df2, on='A', how='outer')

Which returned:

>>> merged
    A   B   C
0   I   1.0 4
1   I   2.0 4
2   II  3.0 5
3   III NaN 6

However, since df2 only contained one value for A == 'I', I do not want this value to be duplicated in the merged dataframe. Instead I would like the following output:

>>> merged
    A   B   C
0   I   1.0 4
1   I   2.0 NaN
2   II  3.0 5
3   III NaN 6

What is the best way to do this? I am new to python and still slightly confused with all the join/merge/concatenate/append operations.

How do you want it to know that the 4 value in C belonged with the 1 in B? — flyingmeatball, Nov 22 '17 at 16:14
@flyingmeatball, I actually do not care about the relationship between columns B and C. I just want 4 to be inserted in the first row of A == 'I' where column C holds no value yet. — Martijn, Nov 22 '17 at 16:33

score 8 · Accepted Answer · answered Nov 22 '17 at 16:14

8

Let us create a new variable g, by cumcount

df1['g']=df1.groupby('A').cumcount()
df2['g']=df2.groupby('A').cumcount()
df1.merge(df2,how='outer').drop('g',1)
Out[62]: 
     A    B    C
0    I  1.0  4.0
1    I  2.0  NaN
2   II  3.0  5.0
3  III  NaN  6.0

answered Nov 22 '17 at 16:14

BENY

317,841
20
164
234

1

Thank you, that did the job! – Martijn Nov 22 '17 at 16:27
@Martijn Yw~ :-) – BENY Nov 22 '17 at 16:30
@Bharath aha:-) – BENY Nov 22 '17 at 16:31
1

@Bharath I learn it from piR , also he speed it up :-) .....https://stackoverflow.com/questions/40575486/merge-pandas-dataframe-with-key-duplicates – BENY Nov 22 '17 at 16:34
@Bharath, Sorry! I missed that topic. – Martijn Nov 22 '17 at 16:34
piR is a legend, he might already have used all possible awesome approaches – Bharath M Shetty Nov 22 '17 at 16:36
@Bharath yeah , he is :-) – BENY Nov 22 '17 at 16:37

Merge dataframes without duplicating rows in python pandas

1 Answers1

Linked

Related