1

Plenty of material on stackoverflow regarding calculating time differences between rows/entries/observations. However, I'm stumped why I'm getting NA's in unusual positions.

I have 3 columns, DATETIME which is posixlt, GRP800 which is the group (factor), and TIME800 which is supposed to represent the time elapsed between each observation for each group. My particular code was derived from Calculate differences between rows faster than a for loop?.

df$TIME800<-unlist(by(df$DATETIME,df$GRP800,function(x)c(NA,diff(x))))

It does appear to function properly for the first group but then I am getting NA's in the middle of the 2nd group. I've tried several approaches using diff and it's producing the identical output. I'm quite puzzled. Any advice would be greatly appreciated.

              DATETIME GRP800  TIME800
1  2013-07-16 16:01:30      1       NA
2  2013-07-16 20:00:54      1 3.990000
3  2013-07-17 00:01:30      1 4.010000
4  2013-07-17 04:01:00      1 3.991667
5  2013-07-17 08:00:50      1 3.997222
6  2013-07-17 12:01:46      1 4.015556
7  2013-07-17 16:00:50      1 3.984444
8  2013-07-17 20:01:00      1 4.002778
9  2013-07-18 00:01:18      1 4.005000
10 2013-07-18 04:01:02      1 3.995556
11 2013-07-18 08:00:50      1 3.996667
12 2013-07-18 12:01:18      2       NA
13 2013-07-18 16:01:02      2 3.970833
14 2013-07-18 20:00:59      2 4.007500
15 2013-07-19 00:01:31      2 3.997222
16 2013-07-19 04:01:18      2 4.011111
17 2013-07-19 08:01:02      2       NA
18 2013-07-19 12:01:57      2 2.007500
19 2013-07-19 20:01:00      2       NA
20 2013-07-20 00:01:00      2 2.003333
> dput(df[1:20,])
structure(list(DATETIME = structure(list(sec = c(30, 54, 30, 
0, 50, 46, 50, 0, 18, 2, 50, 18, 2, 59, 31, 18, 2, 57, 0, 0), 
    min = c(1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 
    0L, 1L, 1L, 1L, 1L, 1L, 1L), hour = c(16L, 20L, 0L, 4L, 8L, 
    12L, 16L, 20L, 0L, 4L, 8L, 12L, 16L, 20L, 0L, 4L, 8L, 12L, 
    20L, 0L), mday = c(16L, 16L, 17L, 17L, 17L, 17L, 17L, 17L, 
    18L, 18L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 20L
    ), mon = c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
    6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), year = c(113L, 113L, 113L, 
    113L, 113L, 113L, 113L, 113L, 113L, 113L, 113L, 113L, 113L, 
    113L, 113L, 113L, 113L, 113L, 113L, 113L), wday = c(2L, 2L, 
    3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 
    5L, 5L, 6L), yday = c(196L, 196L, 197L, 197L, 197L, 197L, 
    197L, 197L, 198L, 198L, 198L, 198L, 198L, 198L, 199L, 199L, 
    199L, 199L, 199L, 200L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), 
    zone = c("MDT", "MDT", "MDT", "MDT", "MDT", "MDT", "MDT", 
    "MDT", "MDT", "MDT", "MDT", "MDT", "MDT", "MDT", "MDT", "MDT", 
    "MDT", "MDT", "MDT", "MDT"), gmtoff = c(NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_)), .Names = c("sec", 
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst", 
"zone", "gmtoff"), class = c("POSIXlt", "POSIXt")), GRP800 = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L), TIME800 = c(NA, 3.99, 4.01, 3.991666667, 3.997222222, 
4.015555556, 3.984444444, 4.002777778, 4.005, 3.995555556, 3.996666667, 
NA, 3.970833333, 4.0075, 3.997222222, 4.011111111, NA, 2.0075, 
NA, 2.003333333)), .Names = c("DATETIME", "GRP800", "TIME800"
), row.names = c(NA, 20L), class = "data.frame")
Community
  • 1
  • 1
odocoileus
  • 53
  • 7
  • I cant reproduce this based on the data you have posted. can you edit with `dput(df[1:20, ]` , ty – user20650 Feb 08 '15 at 23:38
  • Chiming in here to say your code works mine for me too. – thelatemail Feb 08 '15 at 23:40
  • user20650 - attempted to add dput(df[1:20,] output but too many characters. Ok if I pare it down? – odocoileus Feb 08 '15 at 23:59
  • Perhaps this is a silly comment. It looks as if the function is seeing an `ID800` that is different than what's shown. What does `levels(df$ID800)` show? – Mike Satteson Feb 09 '15 at 00:07
  • Thanks for the `dput`. Cant reproduce the error - although had to use `GRP800` rather than`ID800`. Try running a new R session and using the data that you have posted run the `diff` commands. Quick comment: `GRP800` is not the same as `ID800`: do you convert ot to a factor / label it - this could be a source of error as Mike suggests – user20650 Feb 09 '15 at 00:17
  • I pared down the original ID800 to get rid of some characters. GRP800 is the new column and I've updated the code. I've closed R and rerun the code to no avail. – odocoileus Feb 09 '15 at 00:22
  • Yeah, I pared down the ID800 down to GRP800 which essentially leaves the last digit indicating 1, 2, etc. so it'd fit (the first 10 characters in ID800 were deleted). levels(df$GRP800) lists all the factors – odocoileus Feb 09 '15 at 00:25
  • Seems that we've ruled out the GRP800 as the potential culprit - so it must be the date column. I currently have it as df$DATETIME<-strptime(df$DATETIME, "%m/%d/%Y %H:%M:%S"). – odocoileus Feb 09 '15 at 00:31
  • Is it the case that you're showing us only the first 20 rows of a much larger data frame? If so, what happens when you apply the function to the reduced data frame, i.e. `dr <- df[1:20,]`? – Mike Satteson Feb 09 '15 at 00:34
  • 1
    In your `dput` the date is `POSIXlt`. This runs correctly and i still cant reproduce the error but (i think) `POSIXlt` can gives wrong results in a dataframe (??). Try converting it to `as.POSIXct` – user20650 Feb 09 '15 at 00:37
  • When I apply the function to the reduced dataframe, the output is identical as to the top 20 rows of the larger dataframe. – odocoileus Feb 09 '15 at 00:38
  • This is puzzling - I switched to POSIXct and the output was identical. I guess I'll save the dataframe to another .csv file and re-run in case it's corrupted. – odocoileus Feb 09 '15 at 00:41
  • using the same dataset in a new .csv didn't help any. – odocoileus Feb 09 '15 at 00:43
  • yes, it hard what to suggest as your code above runs correctly on my PC. Sorry – user20650 Feb 09 '15 at 00:43
  • Thanks for trying to help me out though :) – odocoileus Feb 09 '15 at 00:52
  • Voting to closes as ...not reproducible error. – IRTFM Feb 09 '15 at 04:18

0 Answers0