2

Sorry guys, I know it is a very basic question, I'm just a beginner

In [55]: df1
Out[55]:
   x  y
a  1  3
b  2  4
c  3  5
d  4  6
e  5  7

In [56]: df2
Out[56]:
   y  z
b  1  9
c  3  8
d  5  7
e  7  6
f  9  5

pd.merge(df1, df2) gives:

In [56]: df2
Out[56]:
   x  y  z 
0  1  3  8
1  3  5  7
2  5  7  6

I'm confused the use of merge, what does '0','1','2' mean? For example,when the index is 0, why x is 1, y is 3 and z is 8?

2 Answers2

3

You get that due to defaults for pd.merge:

merge(left, right, how='inner', on=None, left_on=None, right_on=None,
left_index=False, right_index=False, sort=False, suffixes=('_x',
'_y'), copy=True, indicator=False)

on : label or list
    Field names to join on. Must be found in both DataFrames. If on is
    None and not merging on indexes, then it merges on the intersection of
    the columns by default.

You haven't pass any key to on key, so it merges on the intersection of the columns by default. You have different indices for your df1 and df2 so if you want to keep left or right you should specify that:

In [43]: pd.merge(df1, df2)
Out[43]:
   x  y  z
0  1  3  8
1  3  5  7
2  5  7  6

In [44]: pd.merge(df1, df2, on='y', left_index=True)
Out[44]:
   x  y  z
c  1  3  8
d  3  5  7
e  5  7  6

In [45]: pd.merge(df1, df2, on='y', right_index=True)
Out[45]:
   x  y  z
a  1  3  8
c  3  5  7
e  5  7  6
Anton Protopopov
  • 30,354
  • 12
  • 88
  • 93
1

What pd.merge does is that it joins two dataframes similar to the way two relations are merged using a 'JOIN' statement in case of relational databases.

When you merge df1 and df2 using the code: pd.merge(df1, df2), you haven't specified the values of any other argument of the pd.merge function, so it takes the following default value 'inner' for the 'how' argument of the merge function and does an intersection operation on df1 and df2. The column name common to both df1 and df2 is 'y'. So it searches for common values of the 'y' column in both df1 and df2 and creates a new dataframe with the columns 'x', 'y', 'z' where column 'y' has the common values 3, 5, 7, 'x' will have the values corresponding to 3,5,7 in df1 and similarly 'z' will have the values corresponding to 3,5,7 in df2. The indices of the new dataframe have been set to 0,1,2 (by default) because you haven't specified the indexing pattern in your pd.merge function call using left_index, right_index (which are False by default).

PJay
  • 2,557
  • 1
  • 14
  • 12