Python programming finding similar names from a list of names

Question

I am using a dataset of company names with that may contains not identical duplicates.

The list may contains : company A but also c.o.m.p.a.n.y A or comp A

Is there any python script using NLP for example that can find similar names from a dataset.

Thanks in advance

I guess you have to train another NLP network to preprocess data for another network) Some of the caseswhere there is something like 'c.o.m.p.a.n.y' you can just remove useless characters and leave only letters — Dmitry Barsukoff, Apr 13 '22 at 21:33
Yes I do know the general form of duplicates but not all of them — Amine, Apr 13 '22 at 22:16
maybe these three link help you : [link1](https://stackoverflow.com/questions/17388213/find-the-similarity-metric-between-two-strings) , [link2](https://stackoverflow.com/questions/55162668/calculate-similarity-between-list-of-words) , [link3](https://stackoverflow.com/questions/66919407/calculating-words-similarity-score-in-python) — I'mahdi, Apr 13 '22 at 22:57

score 2 · Answer 1 · answered Apr 13 '22 at 22:59

You can use spacy to get similarities between 2 texts.

import spacy

nlp = spacy.load("en_core_web_md")  # make sure to use larger package!
doc1 = nlp("Coca-Cola")
doc2 = nlp("Pepsi")

doc3 = nlp("Company Coca-Cola")
doc4 = nlp("Company Pepsi-Cola")


print(doc1, "<->", doc2, doc1.similarity(doc2))
print(doc3, "<->", doc4, doc3.similarity(doc4))

With following similarities

Coca-Cola <-> Pepsi 0.6684898494102074
Company Coca-Cola <-> Company Pepsi-Cola 0.934960639746236

Python programming finding similar names from a list of names

1 Answers1