Trying to update each row from df1 to df2 if an unique value is matched. If not, append the row to df2 and assign new ID column.
df1 (NO ID COLUMN):
unique_value Status Price
0 xyz123 bad 6.67
1 eff987 bad 1.75
2 efg125 okay 5.77
df2:
unique_value Status Price ID
0 xyz123 good 1.25 1000
1 xyz123 good 1.25 1000
2 xyz123 good 1.25 1000
3 xyz123 good 1.25 1000
4 xyz985 bad 1.31 1001
5 abc987 okay 4.56 1002
6 eff987 good 9.85 1003
7 asd541 excellent 8.85 1004
Desired output for updated df2:
unique_value Status Price ID
0 xyz123 bad 6.67 1000 <-updated
1 xyz123 bad 6.67 1000 <-updated
2 xyz123 bad 6.67 1000 <-updated
3 xyz123 bad 6.67 1000 <-updated
4 xyz985 bad 1.31 1001
5 abc987 okay 4.56 1002
6 eff987 bad 1.75 1003 <-updated
7 asd541 excellent 8.85 1004
8 efg125 okay 5.77 1005 <-appended
Here is what I have done so far:
for i in range(0, len(df1)):
if df1['unique_value'].isin(df2['unique_value'])[i] == True:
... update row in df2
else:
df2 = df2.append(i)
... assign row with new ID using pd.factorize and ID value at df2['ID'].max()+1
Note that I initial used pd.factorize
to assign ID based on unique_value for df2
with values starting at 1000
, 1001
(and so on) using this code: df2['ID'] = pd.factorize(df2['unique_value'])[0] + 1000
I tried using this solution (Updating a dataframe rows based on another dataframe rows), however it indexes my unique_value column, which prevents me from iterating another dataset moving forward.
Any way we can script this?
Thanks!