Questions tagged [levenshtein-distance]

A metric for measuring the amount of difference between two sequences. The Levenshtein distance allows deletion, insertion and substitution.

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance is a specific algorithm of edit distance algorithms.

References:
Wikipedia
RosettaCode
Edit Distance (Wikipedia)
Hirschberg's algorithm (Wikipedia)

967 questions
435
votes
14 answers

Getting the closest string match

I need a way to compare multiple strings to a test string and return the string that closely resembles it: TEST STRING: THE BROWN FOX JUMPED OVER THE RED COW CHOICE A : THE RED COW JUMPED OVER THE GREEN CHICKEN CHOICE B : THE RED COW JUMPED…
191
votes
2 answers

High performance fuzzy string comparison in Python, use Levenshtein or difflib

I am doing clinical message normalization (spell check) in which I check each given word against 900,000 word medical dictionary. I am more concern about the time complexity/performance. I want to do fuzzy string comparison, but I'm not sure which…
Maggie
  • 5,923
  • 8
  • 41
  • 56
121
votes
6 answers

What algorithm gives suggestions in a spell checker?

What algorithm is typically used when implementing a spell checker that is accompanied with word suggestions? At first I thought it might make sense to check each new word typed (if not found in the dictionary) against it's Levenshtein distance from…
Mithrax
  • 7,603
  • 18
  • 55
  • 60
110
votes
1 answer

Difference between Jaro-Winkler and Levenshtein distance?

I want to do fuzzy matching of millions of records from multiple files. I identified two algorithms for that: Jaro-Winkler and Levenshtein edit distance. I was not able to understand what the difference is between the two. It seems Levenshtein gives…
Bhavesh Shah
  • 3,299
  • 11
  • 49
  • 73
107
votes
7 answers

Levenshtein distance in T-SQL

I am interested in algorithm in T-SQL calculating Levenshtein distance.
Alexander Prokofyev
  • 33,874
  • 33
  • 95
  • 118
81
votes
7 answers

How to calculate distance similarity measure of given 2 strings?

I need to calculate the similarity between 2 strings. So what exactly do I mean? Let me explain with an example: The real word: hospital Mistaken word: haspita Now my aim is to determine how many characters I need to modify the mistaken word to…
Furkan Gözükara
  • 22,964
  • 77
  • 205
  • 342
76
votes
6 answers

Fuzzy search algorithm (approximate string matching algorithm)

I wish to create a fuzzy search algorithm. However, upon hours of research I am really struggling. I want to create an algorithm that performs a fuzzy search on a list of names of schools. This is what I have looked at so far: Most of my research…
Yahya Uddin
  • 26,997
  • 35
  • 140
  • 231
61
votes
4 answers

Levenshtein Distance in VBA

I have excel sheet with data which I want to get Levenshtein Distance between them. I already tried to export as text, read in from script (php), run Levenshtein (calculate Levenshtein Distance), save it to excel again. But I am looking for a way to…
Yousf
  • 3,957
  • 3
  • 27
  • 37
57
votes
2 answers

Hamming Distance vs. Levenshtein Distance

For the problem I'm working on, finding distances between two sequences to determine their similarity, sequence order is very important. However, the sequences that I have are not all the same length, so I pad any deficient strings with empty points…
don
  • 820
  • 1
  • 6
  • 10
56
votes
7 answers

String similarity metrics in Python

I want to find string similarity between two strings. en.wikipedia has examples of some of them. code.google has a Python implementation of Levenshtein distance. Is there a better algorithm, (and hopefully a Python library), under these…
agiliq
  • 7,518
  • 14
  • 54
  • 74
55
votes
7 answers

Sort an array by the "Levenshtein Distance" with best performance in Javascript

So I have a random javascript array of names... [@larry,@nicholas,@notch] etc. They all start with the @ symbol. I'd like to sort them by the Levenshtein Distance so that the the ones at the top of the list are closest to the search term. At the…
alt
  • 13,357
  • 19
  • 80
  • 120
51
votes
4 answers

Levenshtein Distance Algorithm better than O(n*m)?

I have been looking for an advanced levenshtein distance algorithm, and the best I have found so far is O(n*m) where n and m are the lengths of the two strings. The reason why the algorithm is at this scale is because of space, not time, with the…
Jason
  • 14,517
  • 25
  • 92
  • 153
50
votes
9 answers

Implementation of Levenshtein distance for mysql/fuzzy search?

I would like to be able to search a table as follows for smith as get everything that it within 1 variance. Data: O'Brien Smithe Dolan Smuth Wong Smoth Gunther Smiht I have looked into using Levenshtein distance does anyone know how to implement…
Andrew Clark
  • 1,393
  • 1
  • 9
  • 16
47
votes
2 answers

How to create simple fuzzy search with PostgreSQL only?

I have a little problem with search functionality on my RoR based site. I have many Produts with some CODEs. This code can be any string like "AB-123-lHdfj". Now I use ILIKE operator to find products: Product.where("code ILIKE ?", "%" +…
Alve
  • 1,315
  • 2
  • 17
  • 16
44
votes
5 answers

How to compare almost similar Strings in Java? (String distance measure)

I would like to compare two strings and get some score how much these look alike. For example "The sentence is almost similar" and "The sentence is similar". I'm not familiar with existing methods in Java, but for PHP I know the levenshtein…
hsmit
  • 3,906
  • 7
  • 34
  • 46
1
2 3
64 65