2

I am working with huge volume of data and trying to map values from two dataframe. Looking forward for better Time complexity.

Here I am trying to match Code from df2 which are in df1 and take MLC Code from df1 if values match.

df1

Code MLC Code
1 8
2 66
8 62
4 66

df2

Code
1
2
3
4
4
8

Result df

Code MLC Code
1 8
2 66
3 NA
4 62
4 NA
8 66

Here is the code I am using to perform this task but it take lot of time to compute.

for i, j in enumerate(df2["Code"]):
    for x, y in enumerate(df1["Code"]):
         if j == y:
            df2["MLC Code"][i] == df1["MLC Code"][x]
Monil Shah
  • 39
  • 1
  • 6
  • 1
    Iterating through dataframes is an antipattern, you can read more about it in this great [answer](https://stackoverflow.com/a/55557758/4147687). You should look at using a merge, join or concat. The [docs](https://pandas.pydata.org/docs/user_guide/merging.html) here outline the differences between them, it looks like a merge or join will do the trick for you. – Cold Fish Sep 12 '22 at 22:20
  • Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – BeRT2me Sep 12 '22 at 22:45

3 Answers3

3

Try this

df2.merge(df1[['Code', 'MLC Code']], how='left', on='Code')
Alex
  • 707
  • 1
  • 4
  • 9
0

I will try to reproduce the process...

First import module and data

import pandas as pd

# Your sample data
data_1 = {'Code': [1,2,8,4], 'MLC Code': [8,66,62,66]}
data_2 = {'Code': [1,2,3,4,4,8]}

# Create Dataframes from your data
df1 = pd.DataFrame(data_1)
df2 = pd.DataFrame(data_2)

Use merge

df_out = pd.merge(df1, df2, how='right', left_on='Code', right_on='Code')

You will get this output:

    Code    MLC Code
0   1        8.0
1   2       66.0
2   3        NaN
3   4       66.0
4   4       66.0
5   8       62.0

If you want no Index you can do this:

df_out = pd.merge(df1, df2, how='right', left_on='Code', right_on='Code').set_index('Code')
    MLC Code
Code    
1   8.0
2   66.0
3   NaN
4   66.0
4   66.0
8   62.0

Also... The solution given by @alex does the job!!!!

Andre Nevares
  • 711
  • 6
  • 21
0

We can use cumcount with groupby create he sub-merge key

out = df2.assign(key = df2.groupby('Code').cumcount()).\
           merge(df1.assign(key = df1.groupby('Code').cumcount()),how='left')
Out[106]: 
   Code  key  MLC Code
0     1    0       8.0
1     2    0      66.0
2     3    0       NaN
3     4    0      66.0
4     4    1       NaN
5     8    0      62.0
BENY
  • 317,841
  • 20
  • 164
  • 234