I have 2 factor columns, I want to create a third column which tells me what the second one has that the first does not.
It's very similar to this post but I'm having trouble going from a df
to using setdiff()
function.
For example:
library(dplyr)
y1 <- c("a.b.","a.","b.c.d.")
y2 <- c("a.b.c.","a.b.","b.c.d.")
df <- data.frame(y1,y2)
Column y1
has a.b.
and column y2
has a.b.c.
. I want a thirds column to return c.
or just c
.
> df
y1 y2 col3
1 a.b. a.b.c. c.
2 a. a.b. b.
3 b.c.d. b.c.d.
I think that is should be a combination of strsplit
and setdiff
, but I can't get it to work.
I've tried to convert the factor
into character
, then I've tried applying strsplit()
to the results, but the output seems a but weird to me. It seems to have created a list within a list, which makes it difficult to pass to setdiff()
#convert factor to character
df <- df %>% mutate_if(is.factor, as.character)
lapply(df$y1,function(x)(strsplit(x,split = "[.]")))
> lapply(df$y1,function(x)(strsplit(x,split = "[.]")))
[[1]]
[[1]][[1]]
[1] "a" "b"
[[2]]
[[2]][[1]]
[1] "a"
[[3]]
[[3]][[1]]
[1] "b" "c" "d"