7

I have a character datetime column in a file. I load the file (into a data.table) and do things that require the column to be converted to POSIXct. I then need to write the POSIXct value back to file, but the datetime will not be the same (because it is printed incorrectly).

This print/formatting issue is well known and has been discussed several times. I've read some posts describing this issue. The most authoritative answers I found are given in response to this question. The answers to that question provide two functions (myformat.POSIXct and form) that are supposed to solve this issue, but they do not seem to work on this example:

x <- "04-Jan-2013 17:22:08.139"
options("digits.secs"=6)
form(as.POSIXct(x,format="%d-%b-%Y %H:%M:%OS"),format="%d-%b-%Y %H:%M:%OS3")
[1] "04-Jan-2013 17:22:08.138"
form(as.POSIXct(x,format="%d-%b-%Y %H:%M:%OS"),format="%d-%b-%Y %H:%M:%OS4")
[1] "04-Jan-2013 17:22:08.1390"
myformat.POSIXct(as.POSIXct(x,format="%d-%b-%Y %H:%M:%OS"),digits=3)
[1] "2013-01-04 17:22:08.138"
myformat.POSIXct(as.POSIXct(x,format="%d-%b-%Y %H:%M:%OS"),digits=4)
[1] "2013-01-04 17:22:08.1390"

My sessionInfo:

R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                        
[5] LC_TIME=C                              

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] fasttime_1.0-0   data.table_1.8.9 bit64_0.9-2      bit_1.1-9
[5] sas7bdat_0.3     chron_2.3-43     vimcom_0.9-6    

loaded via a namespace (and not attached):
[1] tools_2.15.2
Community
  • 1
  • 1
statquant
  • 13,672
  • 21
  • 91
  • 162
  • For this date, both functions `form()` and `myformat.POSIXct` are doing essentially the same thing, rounding the seconds value to three places. But 0.139 cannot be represented exactly (.1389999 is what I see in the debugger for the fractional part of the rounded value) so the truncation remains. Note that 139 is prime (and thus relatively prime to 2 and 5). – Matthew Lundberg Mar 14 '13 at 01:37

4 Answers4

5

So I guess you do need a little fudge factor added to my suggestion here: https://stackoverflow.com/a/7730759/210673. This seems to work but perhaps might include other bugs; test carefully and think about what it's doing before using for anything important.

myformat.POSIXct <- function(x, digits=0) {
  x2 <- round(unclass(x), digits)
  attributes(x2) <- attributes(x)
  x <- as.POSIXlt(x2)
  x$sec <- round(x$sec, digits) + 10^(-digits-1)
  format.POSIXlt(x, paste("%Y-%m-%d %H:%M:%OS",digits,sep=""))
}
Community
  • 1
  • 1
Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142
  • Your fudge factor looks like a good one here. It would be possible to test this in a loop, at least for small values of digits. Oh, and I'm totally stealing your fudge factor. I added it to my answer in the other, identical question, and will use it in actual code. – Matthew Lundberg Mar 14 '13 at 01:47
  • Glad you think it looks good. It seemed like a reasonable thing to do but I didn't take the time to think it through all the way. – Aaron left Stack Overflow Mar 14 '13 at 03:01
  • Good news it looks like it works on my 1.5M training set (with milliseconds). it seems that it is very slow, but hopefully if the fix is good, may be it can be used to fix the way POSIXct displays (I mean prints) datetimes at C level... – statquant Mar 14 '13 at 10:52
  • I actually doubt all the code is here necessary with the fudge factor added. I was rounding twice as I thought that would make the fudge factor unneeded, but you discovered I was wrong. It might be enough to just round and add the fudge factor to the POSIXct initially and then print. – Aaron left Stack Overflow Mar 14 '13 at 11:57
  • Also stay tuned in the next version of R; in the comments to the other question you'll see that it looks like they may have added a fudge factor in the default printing code itself. – Aaron left Stack Overflow Mar 14 '13 at 11:59
4

As the answers to the questions you linked to already say, how a value is printed/formatted is not the same as what the actual value is. This is just a printed representation issue.

R> as.POSIXct('2011-10-11 07:49:36.3')-as.POSIXlt('2011-10-11 07:49:36.3')
Time difference of 0 secs
R> as.POSIXct('2011-10-11 07:49:36.2')-as.POSIXlt('2011-10-11 07:49:36.3')
Time difference of -0.0999999 secs

Your understanding that POSIXct is less precise than POSIXlt is incorrect. You're also incorrect in saying that you can't include a POSIXlt object as a column in a data.frame.

R> x <- data.frame(date=Sys.time())
R> x$date <- as.POSIXlt(x$date)
R> str(x)
'data.frame':   1 obs. of  1 variable:
 $ date: POSIXlt, format: "2013-03-13 07:38:48"
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • @statquant: because it's another question, not an answer. – Joshua Ulrich Mar 13 '13 at 13:17
  • Ok for the representation, for inclusion in data.frame I meant data.table. The post I am refering gives suggestion on how to solve this representation issue, however with 04-01-2013 17:22:08.139 it seems to fail (see my EDIT). Is there a way to get a accurate representation from POSIXct (at a millisecond level) ? – statquant Mar 13 '13 at 13:19
  • @statquant: It *is* accurate. You're still confusing the actual `POSIXct` **value** with what is **printed**. – Joshua Ulrich Mar 13 '13 at 13:56
  • no I am not, I am actually asking how I can print accurately the time of a POSIXct object. Let's say I have a character datetime column in a file, I load the file and do things that require the column to be casted as POSIXct, if I need to write back the file the datetime will not be the same (It is printed wrongly) – statquant Mar 13 '13 at 14:00
  • 1
    @statquant: I see. That's a clearly articulated problem. Can you edit your question to remove all the extraneous prose, quotes from other posts, and your guesses at solutions? Leave an example of your input and desired output and I'm sure someone will provide an answer. – Joshua Ulrich Mar 13 '13 at 14:03
3

When you write

My understanding is that POSIXct representation is less precise than the POSIXlt representation

you are plain wrong.

It is the same representation for both -- down to milliseconds on Windows, and down to (almost) microseconds on the other OSs. Did you read help(DateTimeClasses) ?

As for your last question, yes the development version of my RcppBDT package uses Boost Date.Time and can go all the way to nanoseconds if your OS supports it and you turned the proper representation on. But it does replace POSIXct, and does not yet support vectors of time objects.

Edit: Regarding your follow-up question:

R> one <- Sys.time(); two <- Sys.time(); two - one
Time difference of 7.43866e-05 secs
R>
R> as.POSIXlt(two) - as.POSIXlt(one)
Time difference of 7.43866e-05 secs
R> 
R> one    # options("digits.sec"=6) on my box
[1] "2013-03-13 07:30:57.757937 CDT"
R> 

Edit 2: I think you are simply experiencing that floating point representation on computers is inexact:

R> print(as.numeric(as.POSIXct("04-Jan-2013 17:22:08.138",
+                   format="%d-%b-%Y %H:%M:%OS")), digits=18)
[1] 1357341728.13800001
R> print(as.numeric(as.POSIXct("04-Jan-2013 17:22:08.139",
+                   format="%d-%b-%Y %H:%M:%OS")), digits=18)
[1] 1357341728.13899994
R> 

The difference is not precisely 1/1000 as you assumed.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • Hello Dirk, are you sure as far as the representation is concerned? I edited quoting another post to illustrate what I meant. I red `help(DateTimeClass)` it is the same as `?POSIXlt`, I do not see anything Windows specific. As you seem to have been deep down those `POSIXct`issues already, how can I get a correct representation of millisecond datetime with POSIXct? – statquant Mar 13 '13 at 12:04
  • Dirk, any reference on your statement about Windows vs other OSs ? – statquant Mar 13 '13 at 13:26
  • Please re-read my original answer. Windows --> milliseconds only. – Dirk Eddelbuettel Mar 13 '13 at 13:28
  • 1
    Hi Dirk, I think as far as floating-point representation goes, the POSIXct is indeed less precise; it has to fit a lot more significant digits into the same size `numeric` as it has the number of seconds since 1970 plus any fractional part; since POSIXlt separates the seconds into its own numeric, there's less significant digits so the floating point representation can be more precise. @statquant is referring to my answer here http://stackoverflow.com/a/7730759/210673 which gives an example. – Aaron left Stack Overflow Mar 13 '13 at 16:55
  • @statquant: I believe this to be wrong. POSIXct is 64 bit double split into 53 and 11 bit. Show source file or R internals / R language manuals for the 40 bit claim. – Dirk Eddelbuettel Mar 15 '13 at 14:06
  • from ?POSIXlt Class ‘"POSIXlt"’ is a named list of vectors representing [...] sec as numeric the rest as integer so 40 bytes (realized I miswrote 40 bits instead of 40 bytes) – statquant Mar 15 '13 at 14:21
3

Two things:

1) @statquant is right (and the otherwise known experts @Joshua Ulrich and @Dirk Eddelbuettel are wrong), and @Aaron in his comment, but that will not be important for the main question here:

POSIXlt by design is definitely more accurate in storing times than POSIXct: As its seconds are always in [0, 60), it has a granularity of about 6e-15, i.e., 6 femtoseconds which would be dozens of million times less granular than POSIXct.

However, this is not very relevant here (and for current R): Almost all operations, notably numeric ones, use the Ops group method (yes, not known to beginners, but well documented), just look at Ops.POSIXt which indeed trashes the extra precision by first coercing to POSIXct. In addition, the format()/print() ing uses 6 decimals after the "." at most, and hence also does not distinguish between the internally higher precision of POSIXlt and the "only" 100 nanosecond granularity of POSIXct.
(For the above reason, both Dirk and Joshua were lead to their wrong assertion: For all simple practical uses, the precision of *lt and *ct is made the same).

2) I do tend to agree that we (R Core) should improve the format()ing and hence print()ing of such fractions of seconds POSIXt objects (still after the bug fix mentioned by @Aaron above).
But then I may be wrong, and "we" have got it right, by some definition of "right" ;-)

Martin Mächler
  • 4,619
  • 27
  • 27