0

Let's say I have two dataframes df1 and df2 as below

a = [1,1,1,2,3,4,4,5,6]
df1 = pd.DataFrame(a, columns=["id"])

    id
0   1
1   1
2   1
3   2
4   3
5   4
6   4
7   5
8   6

and

x = [1,2,3,4,5,6]
y = ["apple","orange","banana","lemon","kiwi","melon"]
df2 = pd.DataFrame(list(zip(x, y)), columns=["fruit_id", "fruit_name"])

    fruit_id  fruit_name
0   1         apple
1   2         orange
2   3         banana
3   4         lemon
4   5         kiwi
5   6         melon

I want to match the fruit_id from df2 to the id in df1 and generate a new column like below

    id  result
0   1   apple
1   1   apple
2   1   apple
3   2   orange
4   3   banana
5   4   lemon
6   4   lemon
7   5   kiwi
8   6   melon
wjandrea
  • 28,235
  • 9
  • 60
  • 81
Yippee
  • 237
  • 1
  • 10
  • A possible solution: `df1.merge(df2, left_on='id', right_on='fruit_id').drop('id', axis=1)`. – PaulS Jul 18 '23 at 17:33
  • 1
    You really don't need merge in this case, use `map` it is much more efficient. `df1['result'] = df1['id'].map(df2.set_index('fruit_id')['fruit_name'])` – Scott Boston Jul 18 '23 at 18:09

1 Answers1

1

You are looking for pd.merge(), which works like a SQL join

merged_df = pd.merge(df1, df2, how='left', left_on='id', right_on='fruit_id').drop(columns='fruit_id')

The merge leaves the columns from each dataframe in it initially, so you want to drop the now-unneeded fruit_id column.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
scotscotmcc
  • 2,719
  • 1
  • 6
  • 29