0

I have two dataframe namely df1 and df2. Both has around 30M rows so its pretty large.

The df1 has 3 columns: A B C where df2 has two columns A D.

The column A is mostly same but is unordered and has some values. For example df2['A'] missing some values compare to df1['A'].

what I need

I need to merge two dataframe so that df1 has column D but for those missing values, I need it to be NaN.

Example

df1

A B C
x 1 2
y 5 6
z 4 4 

df2

A D
x 10
z 20

Merged Dataframe

A B C D
x 1 2 10
y 5 6 NaN
z 4 4 20

What I have tried

I tried using df1.merge(df2, how='right', on="A") but couldn't figure out how to make those missing values NaN. I also tried to iterate over the rows of df2 and add the column D values to df1 but it will take days to finish since the both dataframes are big.

  • Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – Naveed Nov 03 '22 at 14:49

1 Answers1

0

You basically have it, just switch to a left merge so it keeps everything from df1 and fills from df2:

df1 = pd.DataFrame({'A':['x','y','z'], 'B':[1,5,4], 'C':[2,6,4]})
df2 = pd.DataFrame({'A':['x','z'], 'D':[10,20]})
df1.merge(df2, how='left', on='A')

Returns:

    A   B   C   D
0   x   1   2   10.0
1   y   5   6   NaN
2   z   4   4   20.0
Vincent Rupp
  • 617
  • 5
  • 13