Questions tagged [edit-distance]

A string metric describing the differences between two strings. More specifically, it is the number of operations that transform one string into another string. Operations include the insertion, deletion, substitution, or transposition of a character in the string. Operations can be considered in combinations and may have different costs.

References

Edit distance (Wikipedia)

256 questions
107
votes
7 answers

Levenshtein distance in T-SQL

I am interested in algorithm in T-SQL calculating Levenshtein distance.
Alexander Prokofyev
  • 33,874
  • 33
  • 95
  • 118
56
votes
7 answers

String similarity metrics in Python

I want to find string similarity between two strings. en.wikipedia has examples of some of them. code.google has a Python implementation of Levenshtein distance. Is there a better algorithm, (and hopefully a Python library), under these…
agiliq
  • 7,518
  • 14
  • 54
  • 74
49
votes
10 answers

Figure out if a business name is very similar to another one - Python

I'm working with a large database of businesses. I'd like to be able to compare two business names for similarity to see if they possibly might be duplicates. Below is a list of business names that should test as having a high probability of being…
Chris Dutrow
  • 48,402
  • 65
  • 188
  • 258
36
votes
9 answers

Levenshtein distance: how to better handle words swapping positions?

I've had some success comparing strings using the PHP levenshtein function. However, for two strings which contain substrings that have swapped positions, the algorithm counts those as whole new substrings. For example: levenshtein("The quick brown…
thomasrutter
  • 114,488
  • 30
  • 148
  • 167
29
votes
1 answer

Similarity scores based on string comparison in R (edit distance)

I am trying to assign similarity score based on comparison between 2 strings. Is there a function for the same in R. I am aware of such a function in SAS by the name of SPEDIS. Please let me know if there is such a function in R.
Kunal Batra
  • 1,001
  • 3
  • 15
  • 23
26
votes
3 answers

Normalizing the edit distance

I have a question that can we normalize the levenshtein edit distance by dividing the e.d value by the length of the two strings? I am asking this because, if we compare two strings of unequal length, the difference between the lengths of the two…
24
votes
6 answers

Word-level edit distance of a sentence

Is there an algorithm that lets you find the word-level edit distance between 2 sentences? For eg., "A Big Fat Dog" and "The Big House with the Fat Dog" have 1 substitute, 3 insertions
AutoC
  • 265
  • 1
  • 2
  • 4
23
votes
10 answers

Shortest path to transform one word into another

For a Data Structures project, I must find the shortest path between two words (like "cat" and "dog"), changing only one letter at a time. We are given a Scrabble word list to use in finding our path. For example: cat -> bat -> bet -> bot -> bog ->…
dacman
  • 609
  • 1
  • 8
  • 17
22
votes
4 answers

Edit distance between two graphs

I'm just wondering if, like for strings where we have the Levenshtein distance (or edit distance) between two strings, is there something similar for graphs? I mean, a scalar measure that identifies the number of atomic operations (node and edges…
linello
  • 8,451
  • 18
  • 63
  • 109
20
votes
1 answer

How do you implement Levenshtein distance in Delphi?

I'm posting this in the spirit of answering your own questions. The question I had was: How can I implement the Levenshtein algorithm for calculating edit-distance between two strings, as described here, in Delphi? Just a note on performance: This…
JosephStyons
  • 57,317
  • 63
  • 160
  • 234
18
votes
5 answers

Java: Difference between two lists

My company's cat-herding application tracks a convoy of cats. Periodically, it needs to compare previousOrder to currentOrder (each is an ArrayList) and notify the cat-wranglers of any changes. Each cat is unique and can only appear once in…
Tony the Pony
  • 40,327
  • 71
  • 187
  • 281
17
votes
5 answers

Shortest sequence of operations transforming a file tree to another

Given two file trees A and B, is it possible to determine the shortest sequence of operations or a short sequence of operations that is necessary in order to transform A to B? An operation can be: Create a new, empty folder Create a new file with…
Benoit
  • 76,634
  • 23
  • 210
  • 236
17
votes
2 answers

How can I determine Levenshtein distance for Mandarin Chinese characters?

We are developing a system to do fuzzy matching on over 50 international languages using the UTF-8, UTF-16, and UTF-32 Unicode character standard. So far, we have been able to use Levenshtein distance to detect misspellings of German Unicode…
Frank
  • 1,406
  • 2
  • 16
  • 42
16
votes
2 answers

Approximate substring matching using a Suffix Tree

This article discusses approximate substring matching techniques that utilize a suffix tree to improve matching time. Each answer addresses a different algorithm. Approximate substring matching attempts to find a substring (pattern) P in a string T…
Pooven
  • 1,744
  • 1
  • 25
  • 44
15
votes
2 answers

Algorithm to find edit distance to all substrings

Given 2 strings s and t. I need to find for each substring in s edit distance(Levenshtein distance) to t. Actually I need to know for each i position in s what is the minimum edit distance for all substrings started at position i. For example: t =…
1
2 3
17 18