Suppose I have an R data.table (data.frame) with a column of strings of time stamps, i.e.:
"2018-10-12 12:35:37.004226",
"2018-10-12 12:35:38.00332",
"2018-10-12 12:35:39.000999",...
What is the best data type on which to carry out a rolling join? Shall I convert this to unix epoch time (type numeric), or keep the column as is (type character)?
My primary objective is accurate and identical output across different systems. The initial thought was that since numerics are not represented exactly, this might yield a different ordering of the timestamps (and thus different join results across different systems.)
I've now come to understand that while R does not represent a numeric exactly, it does seem to map each number to a unique representation across systems.
That is,
sprintf("%.60f",as.numeric(as.POSIXct("2018-10-12 12:35:37.004226",tz="GMT")))
produces this representation
[1] "1539347737.004225969314575195312500000000000000000000000000000000000000"
across four different systems to which I have access (taken from Double precision (64-bit) representation of numeric value in R (sign, exponent, significand)).
I think it is thus safe to say that as long as one's timestamps do not require an accuracy of more than R's internal numerical representation (of 50+ digits after the decimal point!!), all rolling joins should match across systems as there is no chance for the representation to alter the strict ordering of the timestamps (and to thus yield different join results across different systems.)
So for rolling joins on time stamps, convert to numeric, carry out your join, and don't look back.
Is there agreement on this?