This should be simple to solve but I just got stuck.
I have a data.table
which I need to create complex filtering on as shown in the picture. Essentially, it's akin to indexing function in excel, but I need to do this with the lowest memory footprint possible.
I am building a two-dimensional lookup function. Think of v1 as a category - it can only be "one" "two" or "three". Think of v2 as a range of numeric values. I first specify v1. This will always be "==" exactly equal. Then I want to allocate 6
, the value I feed exampledt[v1=="one" & v2 >= 6, v3 ]
to pick up the corresponding v3 value. The catch is that 6
matches at least two rows. This is the problem here, I want to exploit the table index to avoid having to create all if(v2[i]>=
permutations. I need to allocate 6
to a vector of ranges that goes from (inf-16], (16-6],(6,-inf)
If I feed "one" and 6, I want to be given the value of i=1,j=2.
If I feed "one" and 5, I want to be given the value of i=1,j=3.
If I feed "two" and 17, I want to be given the value of i=2,j=2.
I would prefer to avoid the 5.999
hack and it needs to work for the boundary cases as well. How would I go about using the inequalities? Could I maybe exploit the data.table indexing?
exampledt <- setDT(data.frame(
c("one" , "one", "one",
"two" , "two","two",
"three" , "three", "three" ),
(c(16,6, 5.9999 ,
NA,6, 5.9999 ,
NA,22,21.9999)
)
,letters[1:9]
))
colnames(exampledt)<-c("v1","v2","v3")
#I tried this but the edge cases need additional logic which I need to avoid
exampledt[v1=="one" & v2 >= 6, v3 ]