update data frame new column based on match from other dataframe

Question

I have two dataframes df1,df2 like this: df1: UID are unique

UID STATE
123 WB
456 GH
789 TY
156 OU

df2: UID can be duplicate

UID TYPE    GENR    STATE
123 TYP1    G1  
156 TYP6    G5  
456 TYP3    G6  
789 TYP4    G7  
156 TYP1    G3  
789 TYP5    G7  
123 TYP2    G3

Now, I want update the column "STATE" in df2 based on match in df1. df2:

UID TYPE    GENR    STATE
123 TYP1    G1      WB
156 TYP6    G5      OU
456 TYP3    G6      GH
789 TYP4    G7      TY
156 TYP1    G3      OU
789 TYP5    G7      TY
123 TYP2    G3      WB

How this can be achieved in most efficient way? I tried with this but its' taking too much time for 22M records.

dict_uid_state = dict(zip(df1.UID,df1.STATE))
df2.loc["STATE"] = " "
for each_uid in dict_uid_state:
    query_index = df_input.query('UID == @each_uid',engine='python').index
    df2.loc[query_index, "STATE"] = dict_uid_state[each_uid]

Shubham Sharma · Accepted Answer · 2020-04-02T05:00:20.140

1

You can use df.merge to left merge/join the two dataframes based on the UID column values.

Try this:

df2 = df2[["UID", "TYPE", "GENR"]].merge(df1, on="UID", how="left")

Output:

>>> print(df2)

  UID  TYPE GENR STATE
0  123  TYP1   G1    WB
1  156  TYP6   G5    OU
2  456  TYP3   G6    GH
3  789  TYP4   G7    TY
4  156  TYP1   G3    OU
5  789  TYP5   G7    TY
6  123  TYP2   G3    WB

edited Apr 02 '20 at 05:00

answered Apr 02 '20 at 04:30

Shubham Sharma

68,127
6
24
53

1

@anky_91 yep ;). – Shubham Sharma Apr 02 '20 at 04:59
better close and add comment with solution here – jezrael Apr 02 '20 at 05:10

update data frame new column based on match from other dataframe

1 Answers1