0

I got two google sheets, one sheet has a column with 8000~ Names and another sheet with another column with 4000~ Names.

What I would like to do is with python, match those two columns with each others, modify the names that have a ratio of similaritude of 51 to 99% to be the same and not modify those that have under 50%.

Why ? I need to merge alot of data and some of the names aren't the same, for exemple I have :

Sheet 1 - Column 1 "Name"

Apple LTD.

Microsoft

IBM Security

Stack Overflow

Sheet 2 - Column 1 "Name"

Apple LT.

Microoft

IBM Seurit

Stak Overfow

So I would like that the sheet 2 automatically takes the sheet 1's name since it's the correct one.

osu
  • 85
  • 2
  • 9
  • How would you decide which is correct? – WiLL_K Jan 07 '20 at 15:40
  • @WiLL_K the first sheet is the correct one. (I will say to the script that the first sheet is the one that correction should be taken off) – osu Jan 07 '20 at 15:42
  • I can think of a solution but can you give at least a few names in the question for me to try out? – WiLL_K Jan 07 '20 at 15:43
  • Have a look into https://recordlinkage.readthedocs.io/en/latest/about.html#introduction – WiLL_K Jan 07 '20 at 15:45
  • @WiLL_K here modified the question. – osu Jan 07 '20 at 15:46
  • 2
    Does this answer your question? [is it possible to do fuzzy match merge with python pandas?](https://stackoverflow.com/questions/13636848/is-it-possible-to-do-fuzzy-match-merge-with-python-pandas) – ignoring_gravity Jan 07 '20 at 16:06

0 Answers0