0

I've been trying to figure this out for some time with no success. I've tried using data.table, merge, and the apply functions, however I am a beginner and may have missed something. Basically I want to use the datetime ("dt") information and unique event identifier ("uid") from one data set to create a column telling me whether or not the conditions were met and at which "sid" that occurred. What is complicated is that for a given "dt" and "uid" there may have been an occurrence at more than one "sid" and I want to know that. My datasets look as follows...

     > head(D1)
      X                  dt       tid     rid     sid   uid  nutm     eutm
    1 1 2013-09-06 08:55:00 A69-1601-27 VR2W-11 BUIS_09  1 1932024 389346.7
    2 2 2013-09-06 08:55:00 A69-1601-27 VR2W-12 BUIS_08  1 1932024 389346.7
    3 3 2013-09-06 08:56:00 A69-1601-27 VR2W-11 BUIS_09  1 1932024 389346.7
    4 4 2013-09-06 08:57:00 A69-1601-27 VR2W-11 BUIS_09  1 1932024 389346.7
    5 5 2013-09-06 08:57:00 A69-1601-27 VR2W-12 BUIS_08  1 1932024 389346.7
    6 6 2013-09-06 08:57:00 A69-1601-27 VR2W-11 BUIS_09  1 1932024 389346.7

> head(D2)
  X                  dt uid
1 1 09/06/2013 08:57:00   1
2 2 09/06/2013 08:58:00   1
3 3 09/06/2013 08:59:00   1
4 4 09/06/2013 09:00:00   1
5 5 09/06/2013 09:01:00   1
6 6 09/06/2013 09:02:00   1

I want to run a function that starting with the first row of D2 looks at "uid" and "dt" of each row of D1 and creates a column stating whether those conditions were met and the associated "sid" for which those conditions were met. So my output would look something like this...

head(stimes)
  X                  dt uid  freq sid     tid            rid     nutm    eutm
1 1 09/06/2013 08:57:00   1    T  BUIS_09 A69-1601-27 VR2W-11 1932024 389346.7
1 2 09/06/2013 08:57:00   1    T  BUIS_08 A69-1601-27 VR2W-12 1967024 389346.7

Update: the dput(head(D1)) looks as follows. It's a little difficult because there are a huge number of unique date times but in general...

        "2014-05-03 09:12:00", "2014-05-03 09:13:00", "2014-05-03 09:14:00", 
        "2014-05-03 09:15:00", "2014-05-03 09:16:00", "2014-05-03 09:17:00", 
        "2014-05-03 09:18:00", "2014-05-03 09:19:00", "2014-05-03 09:20:00", 
        "2014-05-03 09:21:00", "2014-05-03 09:22:00"), class = "factor"),     
     tid = structure(c(2L, 2L, 3L, 2L, 2L, 2L), .Label = c("A69-1303-54", 
        "A69-1601-27", "A69-1601-275", "A69-1601-31"), class = "factor"), 
        rid = structure(c(30L, 30L, 30L, 30L, 30L, 30L), .Label = c("VR2W-102913", 
        "VR2W-914", "VR2W-916", "VR2W-102917", "VR2W-210", 
        "VR2W-710", "VR2W-711", "VR2W-103821", "VR2W-088", 
        "VR2W-620", "VR2W-621", "VR2W-104622", "VR2W-938", 
        "VR2W-320", "VR2W-145", "VR2W-110204", "VR2W-205", 
        "VR2W-207", "VR2W-208", "VR2W-110210", "VR2W-634", 
        "VR2W-118", "VR2W-119", "VR2W-114120", "VR2W-121", 
        "VR2W-122", "VR2W-123", "VR2W-114124", "VR2W-125", 
        "VR2W-498", "VR2W-499", "VR2W-115500", "VR2W-501", 
        "VR2W-502", "VR2W-503", "VR2W-115504", "VR2W-925", 
        "VR2W-272", "VR2W-273", "VR2W-120967", "VR2W-585"
        ), class = "factor"), sid = structure(c(9L, 9L, 9L, 9L, 9L, 
        9L), .Label = c("BUIS_01", "BUIS_02", "BUIS_03", "BUIS_04", 
        "BUIS_05", "BUIS_06", "BUIS_07", "BUIS_08", "BUIS_09", "BUIS_10", 
        "BUIS_11", "BUIS_12", "BUIS_13", "BUIS_14", "BUIS_15", "BUIS_16", 
        "BUIS_17", "BUIS_18", "BUIS_19", "BUIS_20", "BUIS_22", "BUIS_26", 
        "BUIS_27", "BUIS_31", "BUIS_33", "BUIS_34", "BUIS_35", "BUIS_36", 
        "BUIS_37", "BUIS_38", "BUIS_41", "BUIS_42", "BUIS_43", "BUIS_46"
        ), class = "factor"), uid = c(NA, NA, NA, 1L, 1L, 1L), nutm = c(NA, 
        NA, NA, 1967024.456, 1967024.456, 1967024.456), eutm = c(NA, 
        NA, NA, 328346.6957, 328346.6957, 328346.6957)), .Names = c("X", 
    "dt", "tid", "rid", "sid", "uid", "nutm", "eutm"), row.names = c(NA, 
    6L), class = "data.frame")
  • Instead or in addition to `head(D1)` could you provide `dput(head(D1))` so we can have a better look at the data? – Frank May 08 '15 at 18:22
  • Also, not sure what your condition is and so cannot make the `freq` column. – Frank May 08 '15 at 18:26
  • Finally, your output is unrelated to your input (introducing new values of `nutm` for example). Your question should be self-contained: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Frank May 08 '15 at 18:30
  • 1
    Apologies. I'll work on getting this into a more generalized structure that is more suitable for feedback. Thanks for the insight. – Espenshade May 08 '15 at 18:40
  • Ok cool. Gotta put the whole `dput` result, though, starting with `structure(` – Frank May 08 '15 at 18:42
  • Oh, just noticed the comment about a lot of unique date-times. If you convert to date-time up front that won't be a problem. `D1$dt <- as.POSIXct(D1$dt)` ... and similarly for `D2` for which we'd also need to see a `dput`, I guess. – Frank May 08 '15 at 19:02

1 Answers1

2

If these were data.tables (they aren't),

setkey(D1,dt,uid)
setkey(D2,dt,uid)

D1[D2,nomatch=0]

might do it.

Frank
  • 66,179
  • 8
  • 96
  • 180