The problem with setdiff
is that it's working on a "set", which assumes that the presence of more than one will be reduced (give or take).
The dupe-link solution is incomplete, and if the strings are different lengths, it can return a false-negative.
Using that code,
a <- "what the h"
b <- "what the hel"
s.a <- strsplit(a, "")[[1]]
s.b <- strsplit(b, "")[[1]]
s.b[s.b != s.a]
# Warning in s.b != s.a :
# longer object length is not a multiple of shorter object length
# [1] "e" "l"
This result is correct, but what if instead b
ended differently:
a <- "what the h"
b <- "what the hwh"
s.a <- strsplit(a, "")[[1]]
s.b <- strsplit(b, "")[[1]]
s.b[s.b != s.a]
# Warning in s.b != s.a :
# longer object length is not a multiple of shorter object length
# character(0)
This incorrectly returns character(0)
because R is recycling s.a
to be the same length as s.b
, and since the length difference is two, and the first two letters of a
are the same as the last two letters of b
, it is finding no differences.
<rant>
Recycling can be useful and a neat trick, but it causes problems often enough that in my opinion it should be an error, or at least something we can turn into an error via options
.
</rant>
The only way around this is to compare the lengths up to the shorter of the two strings, and then append the differences beyond that.
If we aren't certain which is longer, a more complete (yet still admittedly crude) answer might be
a <- "what the h"
b <- "what the hel"
s.a <- strsplit(a, "")[[1]]
s.b <- strsplit(b, "")[[1]]
common <- min(nchar(a), nchar(b))
c(s.b[1:common][ s.b[1:common] != s.a[1:common] ],
if (length(s.a) > common) s.a[-(1:common)],
if (length(s.b) > common) s.b[-(1:common)])
# [1] "e" "l"
and the unlikely case in my counter-example above also works as one might expect:
a <- "what the h"
b <- "what the hwh"
s.a <- strsplit(a, "")[[1]]
s.b <- strsplit(b, "")[[1]]
common <- min(nchar(a), nchar(b))
c(s.b[1:common][ s.b[1:common] != s.a[1:common] ],
if (length(s.a) > common) s.a[-(1:common)],
if (length(s.b) > common) s.b[-(1:common)])
# [1] "w" "h"