How to compare two columns of values between two dataframes

Question

Let's say I have two dataframes df1 and df2 as below

a = [1,1,1,2,3,4,4,5,6]
df1 = pd.DataFrame(a, columns=["id"])

    id
0   1
1   1
2   1
3   2
4   3
5   4
6   4
7   5
8   6

and

x = [1,2,3,4,5,6]
y = ["apple","orange","banana","lemon","kiwi","melon"]
df2 = pd.DataFrame(list(zip(x, y)), columns=["fruit_id", "fruit_name"])

    fruit_id  fruit_name
0   1         apple
1   2         orange
2   3         banana
3   4         lemon
4   5         kiwi
5   6         melon

I want to match the fruit_id from df2 to the id in df1 and generate a new column like below

    id  result
0   1   apple
1   1   apple
2   1   apple
3   2   orange
4   3   banana
5   4   lemon
6   4   lemon
7   5   kiwi
8   6   melon

A possible solution: `df1.merge(df2, left_on='id', right_on='fruit_id').drop('id', axis=1)`. — PaulS, Jul 18 '23 at 17:33
You really don't need merge in this case, use `map` it is much more efficient. `df1['result'] = df1['id'].map(df2.set_index('fruit_id')['fruit_name'])` — Scott Boston, Jul 18 '23 at 18:09

score 1 · Answer 1 · edited Jul 18 '23 at 17:39

1

You are looking for pd.merge(), which works like a SQL join

merged_df = pd.merge(df1, df2, how='left', left_on='id', right_on='fruit_id').drop(columns='fruit_id')

The merge leaves the columns from each dataframe in it initially, so you want to drop the now-unneeded fruit_id column.

edited Jul 18 '23 at 17:39

wjandrea

28,235
9
60
81

answered Jul 18 '23 at 17:34

scotscotmcc

2,719
1
6
29

How to compare two columns of values between two dataframes

1 Answers1