4

I have a data.table like this one, but with many more columns:

library(data.table)
the_dt = data.table(DetailCol1=c("Deets1","Deets2","Deets3","Deets4"), DetailCol2 = c("MoreDeets1","MoreDeets2","MoreDeets3","MoreDeets4"), DataCol1=c("ARP","AARPP","ABC","ABC"), DataCol2=c("ABC","ABC","ABC","ARPe"), DataCol3 = c("ABC", "ARP", "ABC","ABC"))

I want to retrieve DetailCol1 of only those rows that contain a match to the string 'ARP'.

This question was useful in pointing me to like, but I'm still not sure how do this for multiple columns, especially if there are dozens of columns in which I would like to search.

For instance, this is how I could search within DataCol1 the_dt[DataCol1 %like% 'ARP',DetailCol1], but how would I conduct the same search in DataCols 1-100?

Community
  • 1
  • 1
Atticus29
  • 4,190
  • 18
  • 47
  • 84
  • [This question](http://stackoverflow.com/questions/24250878/selecting-multiple-columns-in-data-frame-using-partial-column-name) may help. Refers to [grep](https://stat.ethz.ch/R-manual/R-devel/library/base/html/grep.html) – varontron Oct 02 '16 at 01:09
  • If you are applying this %like% on all the 'DataCol's , in this example, what would be the expected output – akrun Oct 02 '16 at 01:12
  • Varontron, grepping the colnames() might be useful if I can avoid having to use with=F in the data.table... – Atticus29 Oct 02 '16 at 01:16
  • Akrun, in the code above, I would get deets1 and deets2. Applying the same search across all data columns should also return deets 4, if that makes sense. – Atticus29 Oct 02 '16 at 01:18
  • Is it kind of like the matches that are common from all the searches – akrun Oct 02 '16 at 01:22
  • 1
    you might want to `melt` your `data.table` so that you can just search one column... – MichaelChirico Oct 02 '16 at 01:22
  • I want to get all of the detailCol1 rows for which a match exists in that row in any column. – Atticus29 Oct 02 '16 at 01:23
  • In your first DataCol1, 'ARP', match is at position1 & 2, 2nd in 4th and 3rd in 2nd. – akrun Oct 02 '16 at 01:23
  • Sorry. I am not at my computer right now. Searching just datacol1 should give me just row2, you are correct. But I want to search more than just datacol 1 – Atticus29 Oct 02 '16 at 01:26
  • In your examples, the 'datacol1' 1st element is "ARP" and 2nd is "AARPP". So the `%like%` gives both 1st and 2nd match as TRUE. I posted a solution below, please check if that helps – akrun Oct 02 '16 at 01:29

1 Answers1

4

We can specify the columns to compare in .SDcol, loop through it with lapply, convert it to logical using %like%, check whether there is at least one TRUE per each row using Reduce, use that to subset the elements from 'DetailCol1'.

the_dt[the_dt[, Reduce(`|`, lapply(.SD, `%like%`, "ARP")),
                       .SDcols= DataCol1:DataCol3], DetailCol1]
akrun
  • 874,273
  • 37
  • 540
  • 662