0

I don't know what's going on here. Do I have some logical flaw in the code?

I want to match two datasets by their time difference. One case is around 4hs different from the entry in the other set. I am calculating the difference, e.g.:

qnr$submitdate[10]-raw1$time[7]
Time difference of 4 hours

I am specifying a time window:

sum(qnr$submitdate[10]-raw1$time[7] <= 4 & qnr$submitdate[10]-raw1$time[7] > 3.995)
[1] 1

Perfect, 1 match! Now when I am considering the whole data set, I get 0 matches, how can that be?

sum(qnr$submitdate[10]-raw1$time <= 4 & qnr$submitdate[10]-raw1$time > 3.995)
[1] 0

Specifically, I want to match an identifier:

for (i in 1:nrow(qnr)){
 match <- raw1$subject[(qnr$submitdate[i]-raw1$time <= 4 & qnr$submitdate[i]-raw1$time > 3.995)]
 if(length(match)>0) qnr$subject[i] <- match
}

this works, but only for some cases, not the one mentioned above. Can someone please help me and enlighten me?

Data:

qnr <- structure(list(submitdate = structure(c(1635427498, 1635427876, 
1635428218, 1635429757, 1635430844, 1635432380, 1635435962, 1635453487, 
1635464448, 1635508264, 1635509440, 1635509727, 1635510277, 1635511263, 
1635511718, 1635514199, 1635514329, 1635517928, 1635519441, 1635519704, 
1635520386, 1635521108, 1635522747, 1635525148, 1635526577), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), subject = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, 
-25L), class = c("tbl_df", "tbl", "data.frame"))

raw1 <- structure(list(time = structure(c(1635413099, 1635413819, 1635416446, 
1635417980, 1635421563, 1635439088, 1635493864, 1635495041, 1635495326, 
1635495876, 1635496863, 1635499803, 1635499932, 1635503528, 1635505042, 
1635508347, 1635512177, 1635512850, 1635518752, 1635519382), class = c("POSIXct", 
"POSIXt"), tzone = ""), subject = c("9wtd4kldpun6bhgq", "qbvhqxuw67x1eduw", 
"k2dc9c88t3jcfssy", "vmvwfc6z7j236nhk", "7qo7ra1jj25ue3fb", "5xx9qkkb53nzxev5", 
"o6zaaq469c7t2jps", "dfsj021ojphza6uc", "4k0l4a3yrb33hel1", "vf6usaa0cl8kz17t", 
"f1wwfeoeekoru88z", "oe8e2u6w4a1f6f6m", "tnxxywtpsj8nejoa", "zht8w1bfhq4dk22l", 
"atd314r9a4htlaal", "mwbh9eafxczk0x8u", "ke7m4qqp4aodd1fb", "v13fx76lsohsa1hh", 
"8kvynhcvfs09g658", "5scqtdz8ha8cuxt1")), row.names = c(79226L, 
26641L, 79425L, 79624L, 79823L, 26789L, 2961L, 3109L, 3257L, 
47585L, 3405L, 3553L, 3701L, 47784L, 3849L, 47983L, 48182L, 48381L, 
48580L, 48779L), class = c("data.table", "data.frame"))
MoDe
  • 53
  • 6
  • 1
    You are more likely to get a good answer, if you make a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) or [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) including a sample input. In your case, please include a part of `qnr` and `raw1` using `dput(qnr)` or `dput(head(qnr, 20))`. – Martin Gal Dec 20 '21 at 18:46
  • You are right, of course, edited the question. – MoDe Dec 20 '21 at 18:59
  • For some values of `qnr` (for example: `i <- 7` the condition `qnr$submitdate[i]-raw1$time <= 4 & qnr$submitdate[i]-raw1$time > 3.995` returns `FALSE`. Therefore `raw1` doesn't return a match. – Martin Gal Dec 20 '21 at 19:23
  • I cannot reproduce your results. `qnr$submitdate[10]-raw1$time <= 4 & qnr$submitdate[10]-raw1$time > 3.995` returns a single true, and `qnr$submitdate[10]-raw1$time <= 4 & qnr$submitdate[10]-raw1$time > 3.995` returns 9 falses and 1 true. Please include your expected output (actual values, since the nonreproducibility of your second code block suggests the rest of your code may differ between your console and ours. – r2evans Dec 20 '21 at 19:33
  • You're right, it works with that data. Updated the data to 20 and 25 rows where it doesn't work anymore. Maybe the reason is the different length of the sets? – MoDe Dec 20 '21 at 19:41
  • I think I have it. The time difference is not always expressed in hours, but can also be expressed in days, e.g. qnr$submitdate[10]-raw1$time[1] Time difference of 1.101447 days – MoDe Dec 20 '21 at 20:04

1 Answers1

0

The reason is that subtracting times uses the timediff-function which uses units="auto" as a standard.

It works, when changing

qnr$submitdate[i]-raw1$time

to

difftime(qnr$submitdate[i],raw1$time, units="hours")
MoDe
  • 53
  • 6