1

I have two columns of strings in a dataframe, and for each row I want to see the characters which differ.

E.g given

Lines <- "
a     b
cat   car
dog   ding
cow   haw"
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)

return

a     b     diff
cat   car   t
dog   ding  o
cow   haw   co

I have seen

Extract characters that differ between two strings

as well as

Split comma-separated column into separate rows

where a number of neat solutions are returned, which would work for an individual row (first reference), or act row wise but not exactly what I want (second reference).

Ideally I'd like to use something like this:

Reduce(setdiff, strsplit(c(a, b), split = ""))

I tried:

apply(df, function(a,b) Reduce(setdiff, strsplit(c(a, b), split = "")))

but to no avail.

How can this be done?

p.s. I'm particularly keen to do this using dplyr if possible, but only for stylistic reasons

TMrtSmith
  • 461
  • 3
  • 16
  • 1
    Your example is not reproducible. Please consider using `dput`. For example, we will get to see if you actually have character vectors or factor in your columns, which is a common source of confusion. – lmo Oct 12 '17 at 12:26

4 Answers4

2

Assuming df shown reproducibly in the Note at the end define a function Diff which accepts two vecdors of strings, runs the setdiff on them and pastes the result together and then use mapply to run that on the two columns after splitting them into individual characters.

Diff <- function(x, y) paste(setdiff(x, y), collapse = "")
transform(df, diff = mapply(Diff, strsplit(a, ""), strsplit(b, "")))

giving:

    a    b diff
1 cat  car    t
2 dog ding    o
3 cow  haw   co

Note: The input df used above is:

Lines <- "
a     b
cat   car
dog   ding
cow   haw"
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
2

A solution from tidyverse and stringr.

library(tidyverse)
library(stringr)

dt2 <- dt %>%
  mutate(a_list = str_split(a, pattern = ""), b_list = str_split(b, pattern = "")) %>%
  mutate(diff = map2(a_list, b_list, setdiff)) %>%
  mutate(diff = map_chr(diff, ~paste(., collapse = ""))) %>%
  select_if(~!is.list(.))
dt2
# A tibble: 3 x 3
      a     b  diff
  <chr> <chr> <chr>
1   cat   car     t
2   dog  ding     o
3   cow   haw    co

DATA

dt <- read.table(text = "a     b
cat   car
                 dog   ding
                 cow   haw",
                 header = TRUE, stringsAsFactors = FALSE)
www
  • 38,575
  • 12
  • 48
  • 84
1

Using dplyr

library(dplyr)
ff = data.frame(a = c("dog","chair","love"),b = c("dot","liar","over"),stringsAsFactors = F)
st = ff %>% mutate(diff = sapply(Map(setdiff,strsplit(a,""),strsplit(b,"")),paste,collapse = ""))

> st
      a    b diff
1   dog  dot    g
2 chair liar   ch
3  love over    l
0

Here is another base R method using Map.

diffList <- Map(setdiff, strsplit(dat[[1]], ""), strsplit(dat[[2]], ""))
diffList
[[1]]
[1] "t"

[[2]]
[1] "o"

[[3]]
[1] "c" "o"

You can wrap this in sapply to return a character vector for your data.frame:

dat$charDiffs <-sapply(diffList, paste, collapse="")

which returns

dat
    a    b charDiffs
1 cat  car         t
2 dog ding         o
3 cow  haw        co

data (from dput)

dat <- 
structure(list(a = c("cat", "dog", "cow"), b = c("car", "ding", 
"haw")), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")
lmo
  • 37,904
  • 9
  • 56
  • 69