Compare strings with the previous row and calculate the similarity Pandas

Question

Is there any way we can use Pandas to calculate the string similarity with the previous rows in the column?

Row 1: Businesses Pte Ltd
Row 2: Business Pvt Ltd
Row 3: Global Pvt Ltd

It will compare the Row 1 and Row 2, come up with a percentage of similarity. If it is about 90%, replace Row 2 with Row 1 values and so on.

Result

Row 1: Businesses Pte Ltd
Row 2: Businesses Pte Ltd
Row 3: Global Pvt Ltd

Can you provide a definition for "percentage of similarity"? — jpp, Mar 06 '18 at 09:22
Can be based on the number of chars, how many chars are different from the previous row.. — newtoCS, Mar 06 '18 at 09:26
That's interesting. I'm afraid SO isn't the best place to design the *logic* for you (but see links in @Matthew's answer for ideas). You will find many people here who are willing to take your logic and transfer it to code in an efficient way. — jpp, Mar 06 '18 at 09:40

score 2 · Answer 1 · answered Mar 06 '18 at 09:23

This is a surprisingly tricky problem. Presumably you sorted the rows alphabetically first - but what happens if the typo is in the 1st letter? "Businesses Pte Ltd" is a long way from "Vusinesses Pte Ltd".

Still - to solve your problem you want to combine these two solutions:

Find the similarity percent between two strings

Comparing previous row values in Pandas DataFrame

It should get you something workable.

Compare strings with the previous row and calculate the similarity Pandas

1 Answers1