8

My actual data looks like:

8/8/2013 15:10
7/26/2013 10:30
7/11/2013 14:20
3/28/2013 16:15
3/18/2013 15:50

When I read this from the excel file, R reads it as:

41494.63
41481.44
41466.60
41361.68
41351.66

So I used as.POSIXct(as.numeric(x[1:5])*86400, origin="1899-12-30",tz="GMT") and I got:

2013-08-08 15:07:12 GMT
2013-07-26 10:33:36 GMT
2013-07-11 14:24:00 GMT
2013-03-28 16:19:12 GMT
2013-03-18 15:50:24 GMT

Why there is a difference in time? How to overcome it?

Jaap
  • 81,064
  • 34
  • 182
  • 193
Kavipriya
  • 441
  • 4
  • 17
  • Not sure what's going with the answer here, but maybe try this `as.POSIXct(x*60*60*24, tz = "GMT", origin = "1900-01-01")` if this is your x: `x <- c(41494.63, 41481.44, 41466.60, 41361.68, 41351.66)` – David Arenburg Jul 27 '15 at 08:38
  • 1
    @DavidArenburg But the minutes are not correct. –  Jul 27 '15 at 08:42
  • @David If i change the date, my result date will also change. But result dates are correct in the above case. My problem is with minutes only. – Kavipriya Jul 27 '15 at 08:44
  • @Pascal maybe these are rounding issues as `41494.63` had probably more digits, dunno. I didn't post this as an answer, rather as an alternative to the highly upvoted answer which has nothing to do with the question below. I'm sure OP can tweak it to his needs. – David Arenburg Jul 27 '15 at 08:44
  • @DavidArenburg Yes, might be related to a rounding. –  Jul 27 '15 at 08:45
  • 1
    Oh, I didn't see that you already tried that. Anyway, this is the way to go and it's probably Excel is messing up something there. You will have to tweak it to match your needs. – David Arenburg Jul 27 '15 at 08:46
  • So, How can I avoid rounding and get accurate time? – Kavipriya Jul 27 '15 at 08:47
  • Your Excel data has no seconds there, while R `POSIXct` has to have seconds – David Arenburg Jul 27 '15 at 08:48
  • 1
    At this point Kavi could you share how you loaded data into R? I am curious to find out how a raw data like `8/8/2013 15:10` is read in R like `41494.63`. Thank you. – SabDeM Jul 27 '15 at 08:54
  • @SabDeM `x<-as.matrix(read.xlsx("Data.xlsx",sheet=1))` this is how I read my data. `read.xlsx` from openxlsx library – Kavipriya Jul 27 '15 at 09:25

4 Answers4

8

The problem is that either R of Excel is rounding the number to two decimals. When you convert the for example the cell with 8/8/2013 15:10 to text formatting (in Excel on Mac OSX), you get the number 41494.63194.

When you use:

as.POSIXct(41494.63194*86400, origin="1899-12-30",tz="GMT")

it will give you:

[1] "2013-08-08 15:09:59 GMT"

This is 1 second off from the original date (which is also an indication that 41494.63194 is rounded to five decimals).

Probably the best solution to do is export your excel-file to a .csv or a tab-separated .txt file and then read it into R. This gives me at least the correct dates:

> df
            datum
1  8/8/2013 15:10
2 7/26/2013 10:30
3 7/11/2013 14:20
4 3/28/2013 16:15
5 3/18/2013 15:50
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • 1
    `41494.63194` with the last 4 recurring gives exactly the right time. The rounding is definitely guilty here. – thelatemail Jul 27 '15 at 23:15
3

Given

x <- c("8/8/2013 15:10","7/26/2013 10:30","7/11/2013 14:20","3/28/2013 16:15","3/18/2013 15:50")

(which is read as a character vector),

try

x <- as.POSIXct(x, format = "%m/%d/%Y %H:%M", tz = "GMT")

It reads correctly as a POSIXct vector to me.

Nicola Pasquino
  • 441
  • 1
  • 3
  • 11
  • 3
    Since OP specified `tz` argument I think it is better to do not forget it because in Italy where we live `x` will have `CEST` as time zone. – SabDeM Jul 27 '15 at 07:58
  • In my case, x <-c(41494.63, 41481.44, 41466.60, 41361.68, 41351.66) not as u did. when read from excel, date time is read as these values. – Kavipriya Jul 27 '15 at 08:30
  • @kavi - you should specify that you are using Excel in your question - this makes a difference as Excel is nuts in dealing with dates. – thelatemail Jul 27 '15 at 08:32
  • okay! Sorry for making the ques look different. I did the changes. – Kavipriya Jul 27 '15 at 08:33
  • 1
    I don't understand how this answer answers the question. – David Arenburg Jul 27 '15 at 08:35
2

Maybe it is a matter of how R reads the data. Just an example here with lubridate seems to work well.

x <- "8/8/2013 15:10"
library(lubridate)
dmy_hm(x, tz = "GMT")
[1] "2013-08-08 15:10:00 GMT"
SabDeM
  • 7,050
  • 2
  • 25
  • 38
  • This doesn't answer the question neither I guess. – David Arenburg Jul 27 '15 at 08:40
  • @DavidArenburg I think you are right but the last question of the OP was "how to overcome it?" then I provided a way to overcome the problem by using `lubridate`. A pragmatic approach that does not explain why the problem happens but allows the OP to defeat it. I am sure a power R user will solve this and will gain a ton of up vote. – SabDeM Jul 27 '15 at 08:45
  • But OPs input is a number such as `41494.63`, not `"8/8/2013 15:10"` – David Arenburg Jul 27 '15 at 08:50
  • @DavidArenburg... mmm... I think you are right. At this point maybe the OP has to share with use the code he/she used to read data in order to better understand what is going on. I'll ask. – SabDeM Jul 27 '15 at 08:52
2

This is how it works over here on a Windows system. This is what a source Excel 2010 file looks like:

date                num         secs        constant    Rtime
(mm/dd/yyyy)        (in Excel)  (num*86400) (Windows)   (secs-constant) 
08/08/2013 15:10    41494.63    3585136200  2209161600  1375974600
07/26/2013 10:30    41481.44    3583996200  2209161600  1374834600
11/07/2013 14:20    41585.60    3592995600  2209161600  1383834000
03/28/2013 16:15    41361.68    3573648900  2209161600  1364487300
03/18/2013 15:50    41351.66    3572783400  2209161600  1363621800

Rtime <- c(1375974600,1374834600,1383834000,1364487300,1363621800)
as.POSIXct(Rtime,origin="1970-01-01",tz="GMT")
#[1] "2013-08-08 15:10:00 GMT" "2013-07-26 10:30:00 GMT"
#[3] "2013-11-07 14:20:00 GMT" "2013-03-28 16:15:00 GMT"
#[5] "2013-03-18 15:50:00 GMT"

Why this constant? Firstly, because Excel and Office generally is a mess when dealing with dates. Seriously, look over here: Why is 1899-12-30 the zero date in Access / SQL Server instead of 12/31?

2209161600 is the difference in seconds between the POSIXct start of 1970-01-01 and 1899-12-30, which is the 0 point in Excel on Windows.

dput(as.POSIXct(2209161600,origin="1899-12-30",tz="GMT"))
#structure(0, tzone = "GMT", class = c("POSIXct", "POSIXt"))
Community
  • 1
  • 1
thelatemail
  • 91,185
  • 12
  • 128
  • 188