1

Firstly apologies if this has been asked elsewhere. I wasn't sure how to search for it so I didn't re-post an existing question.

I am experiencing some strange behaviour in R when attempting to filter a data.table based on the value in one column existing in another column. I may not be going about this the best way, so am open to to guidance on that front, however I am wanting to better understand why R is behaving the way it is.

I have a data set:

library(data.table)
dt <- data.table(GRP = c(rep("a","4"),rep("b","4")),
                 COLA = c("Type C plus more","Type C plus more", "Type D then some", "Type D then some"),
                 COLB = c("Type C","Type D"))

#    GRP             COLA   COLB
# 1:   a Type C plus more Type C
# 2:   a Type C plus more Type D
# 3:   a Type D then some Type C
# 4:   a Type D then some Type D
# 5:   b Type C plus more Type C
# 6:   b Type C plus more Type D
# 7:   b Type D then some Type C
# 8:   b Type D then some Type D

I am wanting to filter dt based on the value in COLB existing in COLA. I expected it would be some form of string or regex matching so have thought the use of grepl would be suitable.

dt[grepl(COLB,COLA)]

#    GRP             COLA   COLB
# 1:   a Type C plus more Type C
# 2:   a Type C plus more Type D
# 3:   b Type C plus more Type C
# 4:   b Type C plus more Type D

even when I use fixed = TRUE I get the same output.

How is it for COLA = "Type D plus more" I always get a FALSE and for COLA = "Type C plus more" I always get TRUE?

For the record when I do grepl("Type D", "Type C plus more") it does return FALSE

Dan
  • 2,625
  • 5
  • 27
  • 42
  • 5
    Try `dt[ Vectorize(grepl)( COLB, COLA ) ]` I think this Q&A might also be relevant: http://stackoverflow.com/q/35660709/1191259 – Frank Oct 18 '16 at 01:58
  • 1
    That certainly solves the trick. I did note that for my real data set that has `NAs` it throws an error. To resolve this I also added `!is.na()` which solved it. – Dan Oct 18 '16 at 02:11

0 Answers0