Merging 2 Dataframes does not give me the expected number of rows

Question

I'm trying to merge 2 DataFrames following Pandas Merging 101 .

To train, I tried on 2 smalls Dataframe:

df1
    A   B    C
0   4   64   51
1   7   1    6
2   47  21   56
3   96  16   4
4   2   6    4

df2
    A    B    C    D
0   8    6    8    7
1   7    1    14   5
2   8    8    8    8
3   96   16   32   4

I want to check all values of columns A and B of df1, if they correspond to columns A and B of df2, then I create a new column D in dataFrame df2 that correspond to the value of D. If the values does not correspond to any rows in df2 then I store Nan.

I used this:

df3= df1.merge(df2[['A', 'B','D']], on= ['A', 'B'], how= 'left')

And this give me the expected output

OUTPUT:

df3

   A   B   C   D
0  4   64  51  NaN
1  7   1   6   5.0
2  47  21  56  NaN
3  96  16  4   4.0
4  2   6   4   NaN

Now I tried the same procedure but on real world datasets:

df1.shape  -> (1000, 92)

df2.shape  -> (1744, 22)

df3= df1.merge(df2[['NUM_WB', 'WB','TENSION', 'COURANT']], on= ['NUM_WB', 'WB','TENSION'], how= 'left')

df3.shape  -> (72162, 93)

I expected df3 to have the same number of rows as df1. How to explain this?

This means you have duplicated rows on the subset of merging columns, you can drop the duplicates in `df2` on this subset or accept to have more combinations. — mozway, Oct 13 '22 at 07:14

Merging 2 Dataframes does not give me the expected number of rows

0 Answers0