I have two data frames
df1
model_detail series_detail
Nissan Sentra 430 Sedan -1123 Nissan Sentra
Acura ILX150 Base Sedan -17652 Acura ILX Sedan
Acura TLX300 Base FWD -89789 Acura TLX Sedan
Acura MDX450 Advanced Electric Hybrid AWD -55647 Acura MDX Sedan
df_2
Name model Series Amount
Acura MDX 450 Hybrid MDX 45900
Acura ILX 150 Sedan ILX 156700
Nissan Sentra 430 Electric Sentra 88897
Nissan Sunny 150 Sunny 12333
Acura MDX 450 Electric MDX 90000
I need to merge df1 and df2 to get the amount column in the df1.
Converting the column values similar to df2 is a very big process because I have 40 car brands and all are in a different format. It is impossible to write one code for all the brands as there would be many conditions.
Is there any other way where I can join these two data frames? Does fuzzy match help?
Or is there any other alternative? Because the Amount column is very sensitive it is important to have accuracy.
Can any one help me with a code to merge these type of data frames? Thanks a lot in advance.
What I have been doing brand-wise:
def model_extractfunc(row):
blocks = row['model_detail'].split()
mid_index = 1
if ('Hybrid' in blocks) or ('Electric' in blocks) or('Connect' in blocks):
mid_index = 3
return ' '.join(blocks[1:mid_index+1])
df1['extract_model'] = df1.apply(model_extractfunc, axis=1)
Again after extract_model column is generated - I have been checking the strings and replacing them according to the df2
repeating the same process for series_detail column to get Series column for df2
My approach is very time-taking and in-efficient as some brands have too many models and series details that I have to extract and if-else loops - would be inefficient in that cases.
Also, there is a possibility of Human errors as I have to manually check if all models are getting covered and replace names accordingly