Is there a way to check whether two string are approximately the same?

Question

Consider the following two strings: applesauce and apple-sauce . These are referring to the same object. Thus any record containing these two names would be considered duplicates. However, in R, these are considered as separate levels. Could one use edit distance to quantify how similar these two names are using the stringdist package?

Or you could use the default `adist()` function. So it's possible to use edit distance, that that often can get messy. If you just want to ignore non-character values such as dashes or other punctuation, then you can use a regular expression to strip those characters out. You need to be much more explicit about what you want to do with your data in order to turn this into a specific programming question. — MrFlick, Mar 02 '15 at 02:04
You might also want to look at tools like [OpenRefine](http://openrefine.org/) which can be pretty handy for resolving such issues. — A5C1D2H2I1M1N2O1R2T1, Mar 02 '15 at 02:16
You might also look at the RecordLinkages package and the agrep function of base R. For example, agrep("applesauce", "apple-sauce", ignore.case = TRUE, max.distance = 0.4). — lawyeR, Mar 02 '15 at 02:44

score 0 · Answer 1 · edited May 23 '17 at 12:03

0

How about this.

"applesauce"==gsub("-","","apple-sauce")

for multiple arguments like "applesauce"=="apple - sauce" you can used this Replace multiple arguments with gsub

edited May 23 '17 at 12:03

Community

1
1

answered Mar 02 '15 at 02:57

jbest

640
1
10
28

Is there a way to check whether two string are approximately the same?

1 Answers1