I'm trying to merge 2 DataFrames following Pandas Merging 101 .
To train, I tried on 2 smalls Dataframe:
df1
A B C
0 4 64 51
1 7 1 6
2 47 21 56
3 96 16 4
4 2 6 4
df2
A B C D
0 8 6 8 7
1 7 1 14 5
2 8 8 8 8
3 96 16 32 4
I want to check all values of columns A
and B
of df1
, if they correspond to columns A
and B
of df2
, then I create a new column D
in dataFrame df2
that correspond to the value of D
. If the values does not correspond to any rows in df2
then I store Nan.
I used this:
df3= df1.merge(df2[['A', 'B','D']], on= ['A', 'B'], how= 'left')
And this give me the expected output
OUTPUT:
df3
A B C D
0 4 64 51 NaN
1 7 1 6 5.0
2 47 21 56 NaN
3 96 16 4 4.0
4 2 6 4 NaN
Now I tried the same procedure but on real world datasets:
df1.shape -> (1000, 92)
df2.shape -> (1744, 22)
df3= df1.merge(df2[['NUM_WB', 'WB','TENSION', 'COURANT']], on= ['NUM_WB', 'WB','TENSION'], how= 'left')
df3.shape -> (72162, 93)
I expected df3 to have the same number of rows as df1. How to explain this?