2

Say I have two data table dm and dn:

library(data.table)
set.seed(12)
dates = seq.Date(as.Date('2015-09-01'),as.Date('2015-11-01'), 2)
dm = data.table(user=sample(LETTERS[1:4], 10, replace=T),
                time=sample(dates, 10)) 

dn = data.table(user=sample(LETTERS[1:8], 3, replace=F),
                start=c(as.Date('2015-09-01'), as.Date('2015-10-05'),  
                        as.Date('2015-09-14')),
                end=c(as.Date('2015-10-30'), as.Date('2015-11-01'),                    
                      as.Date('2015-10-20')))

>dm
#     user       time
#  1:    A 2015-09-25
#  2:    D 2015-10-19
#  3:    D 2015-09-21
#  4:    B 2015-10-27
#  5:    A 2015-09-15
#  6:    A 2015-09-23
#  7:    A 2015-10-21
#  8:    C 2015-10-31
#  9:    A 2015-10-01
# 10:    A 2015-09-05


>dn
#    user      start        end
# 1:    B 2015-09-01 2015-10-30
# 2:    F 2015-10-05 2015-11-01
# 3:    A 2015-09-14 2015-10-20

How can one do to subset dm based on the columns of dn? For example, for each user in dn, we look up dm for the matched user and subset the rows having time falling between the user's time interval [start, end], if there is any.

In this example, the desired outcome is

    user       time      start        end
 5:    A 2015-09-15 2015-09-14 2015-10-20
 6:    A 2015-09-23 2015-09-14 2015-10-20
 9:    A 2015-10-01 2015-09-14 2015-10-20
10:    A 2015-09-05 2015-09-14 2015-10-20
 4:    B 2015-10-27 2015-09-01 2015-10-30

The row number is retained just for illustration, and the order of time doesn't matter.

Francis
  • 6,416
  • 5
  • 24
  • 32

1 Answers1

3

You can try:

setkey(dm,user)
dm[dn][time>start & time<end]   
#   user       time      start        end
#1:    A 2015-09-25 2015-09-14 2015-10-20
#2:    A 2015-09-15 2015-09-14 2015-10-20
#3:    A 2015-09-23 2015-09-14 2015-10-20
#4:    A 2015-10-01 2015-09-14 2015-10-20
#5:    B 2015-10-27 2015-09-01 2015-10-30
nicola
  • 24,005
  • 3
  • 35
  • 56