2

I have two dataframes df1,df2 like this: df1: UID are unique

UID STATE
123 WB
456 GH
789 TY
156 OU

df2: UID can be duplicate

UID TYPE    GENR    STATE
123 TYP1    G1  
156 TYP6    G5  
456 TYP3    G6  
789 TYP4    G7  
156 TYP1    G3  
789 TYP5    G7  
123 TYP2    G3  

Now, I want update the column "STATE" in df2 based on match in df1. df2:

UID TYPE    GENR    STATE
123 TYP1    G1      WB
156 TYP6    G5      OU
456 TYP3    G6      GH
789 TYP4    G7      TY
156 TYP1    G3      OU
789 TYP5    G7      TY
123 TYP2    G3      WB

How this can be achieved in most efficient way? I tried with this but its' taking too much time for 22M records.

dict_uid_state = dict(zip(df1.UID,df1.STATE))
df2.loc["STATE"] = " "
for each_uid in dict_uid_state:
    query_index = df_input.query('UID == @each_uid',engine='python').index
    df2.loc[query_index, "STATE"] = dict_uid_state[each_uid]
Dcook
  • 899
  • 7
  • 32

1 Answers1

1

You can use df.merge to left merge/join the two dataframes based on the UID column values.

Try this:

df2 = df2[["UID", "TYPE", "GENR"]].merge(df1, on="UID", how="left")

Output:

>>> print(df2)

  UID  TYPE GENR STATE
0  123  TYP1   G1    WB
1  156  TYP6   G5    OU
2  456  TYP3   G6    GH
3  789  TYP4   G7    TY
4  156  TYP1   G3    OU
5  789  TYP5   G7    TY
6  123  TYP2   G3    WB
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53