-1

I'm looking to understand the feasibility and any recommendations on functions or tools in R to loop through a column in one data frame and compare string by string each string in a column from another dataframe.

The idea is to assign a "yes/no/maybe" to each string on looking for "possible matches" in the other data frame (there might be spelling errors or mis-typed...I just want to narrow down the search for matches)

Is this something R can handle pretty well?

ssmit474
  • 11
  • 2
  • could you provide in your question the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: [How to create a Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve)." – Artem Oct 04 '18 at 21:38

1 Answers1

3

Welcome to stackoverflow! There is a function called adist which computes the Levenshtien edit distance between two strings (see similar question here).

Without knowing your specific use-case, we can make up an example:

df <- data.frame(a = c('comparing', 'strings', 'between', 'dataframes'),
                 b = c('comparing', 'integers', 'between', 'data.frames'),
                 stringsAsFactors = F)

df$levenshtein <- mapply(adist, df$a, df$b)
df$ismatch <- 'maybe'
df$ismatch[df$levenshtein == 0] <- 'yes'
df$ismatch[df$levenshtein >= 3] <- 'no'

df
# gives:
           a           b levenshtein ismatch
1  comparing   comparing           0     yes
2    strings    integers           6      no
3    between     between           0     yes
4 dataframes data.frames           1   maybe

You can change the range for the 'maybe' answer of course.

C. Braun
  • 5,061
  • 19
  • 47