Questions tagged [damerau-levenshtein]
17 questions
4
votes
0 answers
Levenstein distance, multiple paths
Edit: TL;DR version: how to get all possible backtraces for Damerau–Levenshtein distance between two words? I'm using https://en.wikipedia.org/wiki/Wagner%E2%80%93Fischer_algorithm in order to compute distance, and trivial backtrace algorithm…

Victor Istomin
- 1,107
- 8
- 14
4
votes
1 answer
Modify Damerau-Levenshtein algorithm to track transformations (insertions, deletions, etc)
I'm wondering how to modify the Damerau-Levenshtein algorithm to track the specific character transformations required to change a source string to a target string. This question has been answered for the Levenshtein distance, but I couldn't find…

Dan Zheng
- 1,493
- 2
- 13
- 22
3
votes
1 answer
Find all pairs of similar words
I have some 40000 words and want to find all similar pairs. For similarity, I use a soft of Damerau–Levenshtein distance scaled by the word lengths. For simplicity, I don't consider overlapping edits (just like the linked algorithm). All words (most…

maaartinus
- 44,714
- 32
- 161
- 320
3
votes
0 answers
Suggestion for limiting fuzzy search suggestion results
I've implemented a fuzzy search algorithm based on a N closest neighbors query for given search terms. Each query returns a pre-set number of raw results, in my case a max. of 200 hits / query, sorted descending by score, highest score first.
The…

Andreas W. Wylach
- 723
- 2
- 10
- 31
2
votes
1 answer
Is there public data for OCR-based character distance?
I am looking for "character visual similarity" weighted data (not an algorithm) to plug into a weighted-Damerau-Levenshtein algorithm.
The Problem
Currently, I am using Google's Vision AI (a paid OCR service) to perform the OCR conversion of an…

Andy R
- 21
- 1
2
votes
0 answers
Using Python to extract the specific edit when Damerau-Levenshtein distance equals 1
I have a large Pandas dataframe containing data entered at a keyboard. One of the columns in the dataframe represents UK postcode data. Inevitably, with large datasets, there are a number of typing errors. I'm using the pyxDamerauLevenshtein library…

user1718097
- 4,090
- 11
- 48
- 63
2
votes
1 answer
Extracting operations from Damerau-Levenshtein
The Damerau-Levenshtein distance tells you the number of additions, deletions, substitutions and transpositions between two words (the latter is what differentiates DL from Levenshtein distance).
The algo is on wikipedia and relatively…

tenpn
- 4,556
- 5
- 43
- 63
1
vote
2 answers
Damerau-Levenshtein distance between two vectors
The Damerau-Levenshtein distance between the two strings "abc" and "acb" would be 1, because it involves one transposition between "b" and "c".
> stringdist("abc", "acb", method = "dl")
[1] 1
Now suppose that I have the following two character…

Ian
- 123
- 5
1
vote
1 answer
Finding which error(s) are detected by Damerau-Levenshtein edit distance algorithm
I'm creating a spelling correction tool and wanted to implement a noisy channel with Bayes theorem. In order to do so, I need to calculate the probability P(X|W), where X is the given (misspelled) word, and W is the possible correction. The…

shahaf hermann
- 45
- 6
1
vote
2 answers
How to choose the proper maximum value for Damerau-Levenshtein distance?
I am using the Damerau-Levenshtein code available from here in my similarity measurements. The problem is that when I apply the Damerau-Levenshtein on two strings such as cat sat on a mat and dog sat mat, I am getting edit distance as 8. This…

Bilgin
- 499
- 1
- 10
- 25
0
votes
0 answers
Strange output of the `adist` fuction in R (string distance)
Why is output not equal to a 1 * 1 matrix here?
EDIT : the strange behaviour comes from the diag function
dist <- adist("errors", "eror", costs = c(1, 1, 1), counts = T)
dist
[,1]
[1,] 2
attr(,"counts")
, , ins
[,1]
[1,] 0
, ,…

Julien
- 1,613
- 1
- 10
- 26
0
votes
0 answers
Is there any specific way to implement Damerau-Levenshtein distances so they perform better on integer pairs?
Say I have a list of numbers composed of a sequential ID followed by two mod11 check digits, for example:
ls = [191, 272, 353, 434, 515, 604, 787, 868, 949, 1082, 1163, 1244, 1325, 1406,
1597, 1678, 1759, 1830, 1910, 2054, 2135, 2216, 2305,…

Maav
- 11
- 2
0
votes
1 answer
using Damerau-Levenshtein distance to compare sets of text in code.org
Not very knowledgeable with coding, I usually use block coding and not typing.
I've used many different Levenshtein distance codes I've found online and most of them didn't work for one reason or another
var levDist = function (s, t) {
var d =…

AnonHooman
- 1
- 4
0
votes
0 answers
Algorithm to find one edit distance words from input word using Levenshtein distance?
I have a dictionary which has so much words in it(Approximately 100000). I am taking one word from user which is wrote wrong. For example this word is "andd". User always write wrong and with one edit distance. My program scan all the dict and find…
0
votes
1 answer
Fuzziness not behaving as expected in Elasticsearch
I am trying to test few test cases for my project which I am doing in Elasticsearch. The result given by the fuzziness query is confusing for a particular case :-
While searching for Mall keyword with fuzziness 2 applied in multi-match query its…

Ashit_Kumar
- 601
- 2
- 10
- 28