I'm not sure whether this question is repeated or not.But,I know like to know more about the optimized Levenshtein Distance Algorithm Implementation in R or Java or Python.I have a Text File which contains numerous strings line by line(close to 2000 records as shown below) in alphabetical order which might have some kind of similarity between them.Now,I want to compare all the pairs of strings in the file and output the distance matrix.Also,please let me know how to use this matrix to filter set strings based on my requirement say LD <=2.
Get back to me if the question is not clear and you need more information.
Sample Text File
----------------
abc
abcd
abe
bac
bad
back
blade
cub
cube
cute
dump
duke