8

In R, I have two character vectors, a and b.

a <- c("abcdefg", "hijklmnop", "qrstuvwxyz")
b <- c("abXdeXg", "hiXklXnoX", "Xrstuvwxyz")

I want a function that counts the character mismatches between each element of a and the corresponding element of b. Using the example above, such a function should return c(2,3,1). There is no need to align the strings. I need to compare each pair of strings character-by-character and count matches and/or mismatches in each pair. Does any such function exist in R?

Or, to ask the question in another way, is there a function to give me the edit distance between two strings, where the only allowed operation is substitution (ignore insertions or deletions)?

smci
  • 32,567
  • 20
  • 113
  • 146
Ryan C. Thompson
  • 40,856
  • 28
  • 97
  • 159

2 Answers2

8

Using some mapply fun:

mapply(function(x,y) sum(x!=y),strsplit(a,""),strsplit(b,""))
#[1] 2 3 1
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • I'm sorry, but that doesn't do what I asked. It happens to give the correct answer on my example, but it would not work if the strings had repeated letters. For example, consider `a <- "aaaaaaa"; b <- "aaaXaaa"`. Your code would return 6 mismatches when the correct answer is 1. – Ryan C. Thompson Jun 24 '13 at 22:28
  • 1
    @RyanThompson - Okay - have adjusted the answer to account for repeats. – thelatemail Jun 24 '13 at 22:33
  • 1
    For clarity I'd rename the vars and the fn: `substitution_distance <- function(s1,s2) { mapply(function(c1,c2) sum(c1!=c2), strsplit(s1,''), strsplit(s2,'')) }` – smci May 17 '14 at 21:25
2

Another option is to use adist which Compute the approximate string distance between character vectors:

mapply(adist,a,b)
abcdefg  hijklmnop qrstuvwxyz 
     2          3          1 
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • The two solutions aren't totally interchangeable, try: `a <- c("cdefgba", "hijklmnop", "qrstuvwxyz")`, my solution gives `c(7,3,1)` while `adist` gives `c(6,3,1)` – thelatemail Jun 25 '13 at 01:01
  • This answer allows indels, while I'm just asking for a character-by-character comparison. – Ryan C. Thompson Mar 24 '14 at 22:19