How to find the row of a data.table containing the most matches of a query vector

Question

I have a data.table like

library(data.table)
ffDummy_dt = data.table(Annotation=c("chr10:10..20,-", "chr10:25..30,-"
,"chr10:35..100,-","chr10:106..205,-","chr10:223..250,-","chr10:269..478,-"
,"chr10:699..1001,-","chr10:2000..2210,-","chr10:2300..2500,-"
,"chr10:2678..5678,-"),tpmOne=c(0,0,0.213,1,1.2,0.5,0.7,0.9,0.8,0.86), 
tpmTwo=c(100,1000,1001,1500,900,877,1212,1232,1312,0),tpmThree=c(0.2138595,0,0,0
,0,0,0.6415786,0,0,0))

I want to pass a query (can be vector or even a data.table if need be) like:

test_v = c(0,0,0.86)

I want to find out which row is the best match.

In my real use case, test_v is like 20 elements long and the nrow(Dummy_dt) is >>20 (but likely there will only be one perfect match per 20-element vector).

Currently,

which.max(apply(as.matrix(ffDummy_dt[,2:ncol(ffDummy_dt),with=F]), 1, 
  function(k) sum(test_v%in%k)))

seems to work (gives the correct output in this case, which is 10), but this is not a data.table solution.

I've had a look here but can't quite figure out how to use %in% k above with data.table.

So you're saying the order of the elements in `test_v` makes no difference? If so, that's a messy problem. — Frank, Oct 04 '16 at 19:52
Ok, well, I guess you are in for a lot of difficulty. For one thing, try `.1 + .2 == .3` and then maybe read http://stackoverflow.com/q/9508518/ If you were looking for integers or strings or something, this would be doable, though. — Frank, Oct 04 '16 at 19:59
So the matches are not exclusive, and you really want the number of matches to be 2 for the first row? — eddi, Oct 05 '16 at 15:55

score 0 · Answer 1 · answered Oct 05 '16 at 16:01

0

Assuming you actually want the matches to be exclusive (that seems to me to make more sense for a row to be a "best match"), you can do:

Reduce(`+`, lapply(ffDummy_dt, `%in%`, test_v))
#[1] 1 2 1 1 1 1 0 1 1 3

answered Oct 05 '16 at 16:01

eddi

49,088
6
104
155

How to find the row of a data.table containing the most matches of a query vector

1 Answers1