I'm starting to use pandas and I came across a problem that I don't know how to solve.
I have two dataframes.
The first contains car information including the car model (column DESCR_MARCA_VEICULO)
df1
col1 col2 DESCR_MARCA_VEICULO
.... .... 'GM/CELTA 5 PORTAS SUPER'
.... .... 'VW/VOYAGE LS'
.... .... 'VW/GOL LS'
.... .... 'I/AUDI A4 2.0T FSI'
.... .... 'FIAT/UNO CS IE'
The second contains a two-column de-para containing the car model and a unique ID associated with that model, like that:
df2
ID DESCR_MARCA_VEICULO
1 'GM - CELTA 5'
2 'VW - VOYAGE LS'
3 'VW - GOL LS'
4 'ACURA - INTEGRA GS 1.8'
5 'AUDI - 80 S2 AVANT'
And it doesn't necessarily follow a pattern like replacing "/" with " - " or something.
However, I have more than 5000 different car models in DF1 (what makes it impossible for me to look case by case) and I need to combine DF1 and DF2 bringing the ID column to DF1 (it would be a merge). However, when I merge the dfs, there is no match because of these differences in strings.
Is there any way I can merge these dfs by the similarity between the strings in the DESCR_MARCA_VEICULO column?
Thank you :)