I have two dataframes df1,df2 like this: df1: UID are unique
UID STATE
123 WB
456 GH
789 TY
156 OU
df2: UID can be duplicate
UID TYPE GENR STATE
123 TYP1 G1
156 TYP6 G5
456 TYP3 G6
789 TYP4 G7
156 TYP1 G3
789 TYP5 G7
123 TYP2 G3
Now, I want update the column "STATE" in df2 based on match in df1. df2:
UID TYPE GENR STATE
123 TYP1 G1 WB
156 TYP6 G5 OU
456 TYP3 G6 GH
789 TYP4 G7 TY
156 TYP1 G3 OU
789 TYP5 G7 TY
123 TYP2 G3 WB
How this can be achieved in most efficient way? I tried with this but its' taking too much time for 22M records.
dict_uid_state = dict(zip(df1.UID,df1.STATE))
df2.loc["STATE"] = " "
for each_uid in dict_uid_state:
query_index = df_input.query('UID == @each_uid',engine='python').index
df2.loc[query_index, "STATE"] = dict_uid_state[each_uid]