How do I calculate the Levenshtein distance between two Pandas DataFrame columns?

Question

I'm trying to calculate the Levenshtein distance between two Pandas columns but I'm getting stuck Here is the library I'm using. Here is a minimal, reproducible example:

import pandas as pd
from textdistance import levenshtein

attempts = [['passw0rd', 'pasw0rd'],
            ['passwrd', 'psword'],
            ['psw0rd', 'passwor']]

df=pd.DataFrame(attempts, columns=['password', 'attempt'])

   password  attempt
0  passw0rd  pasw0rd
1   passwrd   psword
2    psw0rd  passwor

My poor attempt:

df.apply(lambda x: levenshtein.distance(*zip(x['password'] + x['attempt'])), axis=1)

This is how the function works. It takes two strings as arguments:

levenshtein.distance('helloworld', 'heloworl')

Out[1]: 2

Have a look at [this](https://stackoverflow.com/questions/13636848/is-it-possible-to-do-fuzzy-match-merge-with-python-pandas/56315491#56315491) post by Erfan, it goes over how to implement the fuzzy wuzzy package which implements the levenshtein distance algo to match words. — Umar.H, Jan 31 '20 at 15:53
Sounds like [this question](https://stackoverflow.com/questions/12376863/adding-calculated-columns-to-a-dataframe-in-pandas) might help? — Nightara, Jan 31 '20 at 15:53
@Datanovice I don't think it's about the Levenshtein function (Since the question already includes an import to calculate that), but about how to apply it to a DF. — Nightara, Jan 31 '20 at 15:55
When you use `apply`, each row is returned as `x` to your `lambda` as a `Series`. Why do you zip them? just pass them as `x['password']` etc. — anishtain4, Jan 31 '20 at 16:04
Does this answer your question? [Edit distance between two pandas columns](https://stackoverflow.com/questions/42892617/edit-distance-between-two-pandas-columns) — Abu Shoeb, Apr 28 '21 at 19:12

score 12 · Accepted Answer · answered Jan 31 '20 at 15:56

Maybe I'm missing something, is there a reason you don't like the lambda expression? This works to me:

import pandas as pd
from textdistance import levenshtein

attempts = [['passw0rd', 'pasw0rd'],
            ['passwrd', 'psword'],
            ['psw0rd', 'passwor'],
            ['helloworld', 'heloworl']]

df=pd.DataFrame(attempts, columns=['password', 'attempt'])

df.apply(lambda x: levenshtein.distance(x['password'],  x['attempt']), axis=1)

out:

0    1
1    3
2    4
3    2
dtype: int64

or with `map`: `df.assign(distance=[*map(levenshtein.distance, df.password, df.attempt)])` — piRSquared, Jan 31 '20 at 16:05

How do I calculate the Levenshtein distance between two Pandas DataFrame columns?

1 Answers1

Linked