0

I have data that looks like

Dates                   another column
2015-05-13 23:53:00     some values
2015-05-13 23:53:00     ....
2015-05-13 23:33:00
2015-05-13 23:30:00
...
2003-01-06 00:01:00
2003-01-06 00:01:00

The code I then used is

trainDF<-read.csv("train.csv") 
diff<-as.POSIXct(trainDF[1,1])-as.POSIXct(trainDF[,1])
head(diff)
Time differences in hours
[1] 23.88333 23.88333 23.88333 23.88333 23.88333 23.88333

However, this doesn't make sense because subtracting the 1st two entries should give 0, since they are the exact same time. Subtracting the 3rd entry from the 1st should give a difference of 20 minutes, not 23.88333 hours. I get the similar values that don't make sense when I try as.duration(diff) and as.numeric(diff). Why is this?

Frank
  • 66,179
  • 8
  • 96
  • 180
CYPHER
  • 97
  • 1
  • 3
  • 8
  • You might need to make this question reproducible http://stackoverflow.com/a/28481250/1191259 If I do `x = c("2015-05-13 23:53:00","2015-05-13 23:53:00"); as.POSIXct(x[1]) - as.POSIXct(x)` it works. – Frank Oct 02 '15 at 18:49
  • subtracting two dates like that works for me too, but I still get the bad values I mentioned in the question when I try to subtract all the entries in the first column from the first entry. I got the data from https://www.kaggle.com/c/sf-crime/data – CYPHER Oct 02 '15 at 19:00

1 Answers1

0

If you just have a series of dates in POSIXct, you can use the diff function to calculate the difference between each date. Here's an example:

> BD <- as.POSIXct("2015-01-01 12:00:00", tz = "UTC") # Making a begin date.
> ED <- as.POSIXct("2015-01-01 13:00:00", tz = "UTC") # Making an end date.
> timeSeq <- seq(BD, ED, "min") # Creating a time series in between the dates by minute.
> 
> head(timeSeq) # To see what it looks like.
[1] "2015-01-01 12:00:00 UTC" "2015-01-01 12:01:00 UTC" "2015-01-01 12:02:00 UTC" "2015-01-01 12:03:00 UTC" "2015-01-01 12:04:00 UTC"
[6] "2015-01-01 12:05:00 UTC"
> 
> diffTime <- diff(timeSeq) # Takes the difference between each adjacent time in the time series.
> print(diffTime) # Printing out the result.
Time differences in mins
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> 
> # For the sake of example, let's make a hole in the data.
> 
> limBD <- as.POSIXct("2015-01-01 12:15:00", tz = "UTC") # Start of the hole we want to create. 
> limED <- as.POSIXct("2015-01-01 12:45:00", tz = "UTC") # End of the hole we want to create.
> 
> timeSeqLim <- timeSeq[timeSeq <= limBD | timeSeq >= limED] # Make a hole of 1/2 hour in the sequence.
> 
> diffTimeLim <- diff(timeSeqLim) # Taking the diff.
> print(diffTimeLim) # There is now a large gap, which is reflected in the print out.
Time differences in mins
 [1]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 30  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1

However, I read through your post again, and it seems you just want to subtract each item not in the first row by the first row. I used the same sample I used above to do this:

Time difference of 1 mins
> timeSeq[1] - timeSeq[2:length(timeSeq)]
Time differences in mins
 [1]  -1  -2  -3  -4  -5  -6  -7  -8  -9 -10 -11 -12 -13 -14 -15 -16 -17 -18 -19 -20 -21 -22 -23 -24 -25 -26 -27 -28 -29 -30 -31 -32 -33 -34 -35 -36
[37] -37 -38 -39 -40 -41 -42 -43 -44 -45 -46 -47 -48 -49 -50 -51 -52 -53 -54 -55 -56 -57 -58 -59 -60

Which gives me what I'd expect. Trying a data.frame method:

> timeDF <- data.frame(time = timeSeq)
> timeDF[1,1] - timeDF[, 1]
Time differences in secs
 [1]     0   -60  -120  -180  -240  -300  -360  -420  -480  -540  -600  -660  -720  -780  -840  -900  -960 -1020 -1080 -1140 -1200 -1260 -1320 -1380
[25] -1440 -1500 -1560 -1620 -1680 -1740 -1800 -1860 -1920 -1980 -2040 -2100 -2160 -2220 -2280 -2340 -2400 -2460 -2520 -2580 -2640 -2700 -2760 -2820
[49] -2880 -2940 -3000 -3060 -3120 -3180 -3240 -3300 -3360 -3420 -3480 -3540 -3600

It seems I'm not encountering the same problem as you. Perhaps coerce everything to POSIX.ct first and then do your subtraction? Try checking the class of your data and make sure it is actually in POSIXct. Check the actual values you are subtracting and that may give you some insight.


EDIT:

After downloading the file, here's what I ran. The file is trainDF:

trainDF$Dates <- as.POSIXct(trainDF$Dates, tz = "UTC") # Coercing to POSIXct.
datesDiff <- trainDF[1, 1] - trainDF[, 1] # Taking the difference of each date with the first date.
head(datesDiff) # Printing out the head.

With results:

Time differences in secs
[1]    0    0 1200 1380 1380 1380

The only thing I did differently was use the time zone UTC, which does not shift hours with daylight savings time, so there should be no effect there.

HOWEVER, I did the exact same method as you and got the same results:

> diff<-as.POSIXct(trainDF[1,1])-as.POSIXct(trainDF[,1])
> head(diff)
Time differences in hours
[1] 23.88333 23.88333 23.88333 23.88333 23.88333 23.88333

So there is something up with your method, but I can't say what. I do find that it is typically safer to coerce and then do some mathematical operation instead of all together in one line.

giraffehere
  • 1,118
  • 7
  • 18
  • I tried to coerce everything to POSIX.ct first with `fcol<-as.POSIXct(trainDF[,1])`, but it still gives wrong results. `fcol[3]-fcol[1]` gives `time difference of 0 secs`, even though it should be 20 minutes. The data I got is from https://www.kaggle.com/c/sf-crime/data – CYPHER Oct 02 '15 at 19:27
  • I downloaded the file. Check my edit in my answer for details. – giraffehere Oct 02 '15 at 19:43