0

I have df1 with country and sales and df2 with country and sales. Country spelling is not exactly the same in df1 and df2. How do I make country spelling in df1 match that of df2 before merging the two dataframes together.

The merge below doesn't give me all matches due to different spellings: pd.merge(df1, df2, on='country', how='left')

Ophir
  • 1
  • 4
  • Really, the best way to do this is to have a clean database to start with. I realize this isn't always possible but without knowing how names will differ it will be near on impossible to do. If you ONLY have the two variations of spelling, you could keep a third database mapping one spelling to the other. But if you have a multitude of spelling it could be impractical – Galo do Leste Feb 08 '23 at 01:27
  • When you say that country spelling is not the same, do you refer to minor differences (`Bahreïn` vs `Bahrain`) or totally different spellings (`Myanmar` vs `BURMA`)? – Sheldon Feb 08 '23 at 01:37
  • Look for `fuzzy match merge` solutions, e.g. https://stackoverflow.com/q/13636848/9987623. To get specific help with your data, post a sample of your data, the code you've tried, and the expected result. – AlexK Feb 08 '23 at 02:23
  • If you have a mapper which merges to alternatives, you can merge your data left on that list of alternatives then merge against the list of all country alternatives – ifly6 Feb 08 '23 at 02:58
  • @Sheldon It's minor differences like United States vs United States of America or Bahrein vs Bahrain. – Ophir Feb 08 '23 at 17:24

0 Answers0