Is there a package that contains Levenshtein distance counting function which is implemented as a C or Fortran code? I have many strings to compare and stringMatch
from MiscPsycho
is too slow for this.
Asked
Active
Viewed 1.8k times
34
4 Answers
21
And stringdist
in the stringdist
package does it too, even faster than levenshteinDist
under certain conditions (1)

Ben
- 41,615
- 18
- 132
- 227
-
3stringdist has sped up significantly since that blog you link to: it now uses multiple cores. β Feb 26 '16 at 17:02
17
levenshteinDist (from the RecordLinkage
package) calls compiled C code. Give it a try.

MichaelChirico
- 33,841
- 14
- 113
- 198

gd047
- 29,749
- 18
- 107
- 146
-
2Just noting the RecordLinkage package is apparently no longer maintained and has been pulled from CRAN. The `stringdist` package is the solution now. β Brian Stamper Feb 27 '20 at 17:42
-
Just noting the RecordLinkage package is *not* pulled from CRAN, itβs just available: https://cran.r-project.org/web/packages/RecordLinkage/ β MS Berends Aug 12 '22 at 19:41
6
You could try stringDist
from Biostrings
as well

MichaelChirico
- 33,841
- 14
- 113
- 198

Aaron Statham
- 2,048
- 1
- 15
- 16
1
You could also use levenshtein_distance()
from the textTinyR
package. I got 'calloc' memory errors with all other packages when it came to larger character vectors of around 30k characters. Only textTinyR
worked for me!

interrobang
- 83
- 1
- 7