Add column from another dataframe if two column matches

Question

I am working with huge volume of data and trying to map values from two dataframe. Looking forward for better Time complexity.

Here I am trying to match Code from df2 which are in df1 and take MLC Code from df1 if values match.

df1

Code	MLC Code
1	8
2	66
8	62
4	66

df2

Code
1
2
3
4
4
8

Result df

Code	MLC Code
1	8
2	66
3	NA
4	62
4	NA
8	66

Here is the code I am using to perform this task but it take lot of time to compute.

for i, j in enumerate(df2["Code"]):
    for x, y in enumerate(df1["Code"]):
         if j == y:
            df2["MLC Code"][i] == df1["MLC Code"][x]

Iterating through dataframes is an antipattern, you can read more about it in this great [answer](https://stackoverflow.com/a/55557758/4147687). You should look at using a merge, join or concat. The [docs](https://pandas.pydata.org/docs/user_guide/merging.html) here outline the differences between them, it looks like a merge or join will do the trick for you. — Cold Fish, Sep 12 '22 at 22:20
Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) — BeRT2me, Sep 12 '22 at 22:45

score 3 · Answer 1 · answered Sep 12 '22 at 22:20

3

Try this

df2.merge(df1[['Code', 'MLC Code']], how='left', on='Code')

answered Sep 12 '22 at 22:20

Alex

707
1
4
9

Andre Nevares · Answer 2 · 2022-09-12T23:42:44.270

I will try to reproduce the process...

First import module and data

import pandas as pd

# Your sample data
data_1 = {'Code': [1,2,8,4], 'MLC Code': [8,66,62,66]}
data_2 = {'Code': [1,2,3,4,4,8]}

# Create Dataframes from your data
df1 = pd.DataFrame(data_1)
df2 = pd.DataFrame(data_2)

Use merge

df_out = pd.merge(df1, df2, how='right', left_on='Code', right_on='Code')

You will get this output:

    Code    MLC Code
0   1        8.0
1   2       66.0
2   3        NaN
3   4       66.0
4   4       66.0
5   8       62.0

If you want no Index you can do this:

df_out = pd.merge(df1, df2, how='right', left_on='Code', right_on='Code').set_index('Code')

    MLC Code
Code    
1   8.0
2   66.0
3   NaN
4   66.0
4   66.0
8   62.0

Also... The solution given by @alex does the job!!!!

score 0 · Answer 3 · answered Sep 12 '22 at 22:42

0

We can use cumcount with groupby create he sub-merge key

out = df2.assign(key = df2.groupby('Code').cumcount()).\
           merge(df1.assign(key = df1.groupby('Code').cumcount()),how='left')
Out[106]: 
   Code  key  MLC Code
0     1    0       8.0
1     2    0      66.0
2     3    0       NaN
3     4    0      66.0
4     4    1       NaN
5     8    0      62.0

answered Sep 12 '22 at 22:42

BENY

317,841
20
164
234

WOW.... Always nice to learn! – Andre Nevares Sep 12 '22 at 23:38
@AndreNevares happy coding ~ – BENY Sep 13 '22 at 01:09

Add column from another dataframe if two column matches

3 Answers3