Say I have the following dataframes:
>>> df1 = pd.DataFrame({'fruit':['apple','orange','orange'],'taste':['sweet','sweet','sour']})
>>> df1
fruit taste
0 apple sweet
1 orange sweet
2 orange sour
>>> df2 = pd.DataFrame({'fruit':['apple','orange','orange'],'price':['high','low','low']})
>>> df2
fruit price
0 apple high
1 orange low
2 orange low
When I do df3=df1.merge(df2,on='fruit')
, I got the following result:
fruit taste price
0 apple sweet high
1 orange sweet low
2 orange sweet low
3 orange sour low
4 orange sour low
Here it looks like 2 duplicate rows were created; instead, I would expect something like
fruit taste price
0 apple sweet high
1 orange sweet low
3 orange sour low
How should I understand this behavior and how to obtain the result I was looking for?