1

I am reading an xlsx file using read_excel from the readxl package in R. My column has a string in it so readxl gets it as a string. However the date times get transformed into a numeric before and then given as a string. Using as.Date I can correctly get the Date portion of the datetime but I loose the timestamp. Using as.POSIXct with different origins does not give the right timestamp. Here is an example:

Time as it appears in Excel: 28/09/2016 14:47 It is read by read_excel as a string: "42641.6163310185" The following fail to give 28/09/2016 14:47

> as.POSIXct(as.numeric("42641.6163310185"), origin = "1899-12-30")
[1] "1899-12-30 11:50:42.616331 GMT"

> as.POSIXct(as.numeric("42641.6163310185"), origin = "2016-09-28")
[1] "2016-09-28 12:50:41.616331 BST"

What is the correct transformation and is some information getting lost by the read_excel function?

EDIT: From data difference in as.POSIXct in R with Excel The transformation is *86400 which I wasn't aware of. The following works:

> as.POSIXct(42641.6163310185*86400, origin = "1899-12-30", tz="UTC")
[1] "2016-09-28 14:47:30 UTC"
Community
  • 1
  • 1
Courvoisier
  • 904
  • 12
  • 26
  • 1
    http://stackoverflow.com/a/39799050/1412059 – Roland Nov 14 '16 at 09:39
  • Excel stores dates in a binary format used by OLE Automation and VB6, and used to be the most common date format. Nowadays, you'll find it as `OA Date`. The integer part is the date offset since `1899-12-30` (negatives allowed) while the decimal part is the time expressed as the fraction of a day. There's no point trying to treat this as a string, because it really is a decimal number – Panagiotis Kanavos Nov 14 '16 at 09:40
  • 1
    `read_excel` should be able to detect this is a date and import it as such. So I am guessing your column contains other entries which are not in this format? To force it to be a date, try adjusting the col_types argument in `read_excel`. If for example the name of your date column is "x", import the file as `read_excel(, col_types = c(x = "date", ...))` (where `...` are your other columns and their types) – konvas Nov 14 '16 at 09:43
  • @PanagiotisKanavos: Yes, the column contains things other that dates – Courvoisier Nov 28 '16 at 15:06
  • @konvas : I do not want to typecast. The spreadsheet might change. – Courvoisier Nov 28 '16 at 15:06
  • If anything this would be desirable, because it would warn you that your spreadsheet format has changed and you should double check your script still works as intended, no? – konvas Nov 28 '16 at 15:13
  • @Courvoisier the "duplicate" isn't relevant. There's nothing wrong with Excel's value, nor should it need any conversion. The OADate binary format is very well defined and `read_excel` should pick it without problems. The real problem is that the column contains non-date data, which obviously results in conversion errors. How do you expect to load date data when the column contains mixed data? – Panagiotis Kanavos Nov 28 '16 at 15:37
  • @PanagiotisKanavos : as text. The column is getting imported as text anyway so why make it into a numeric, realize that the column has other things, then import it as text. – Courvoisier Nov 29 '16 at 09:48
  • @Courvoisier because you just had to convert some of that data. You should *clean* your data first, then try to use it. Or you may be reading the sheet the wrong way - the same sheet can have multiple *tables* which you can read individually. Or maybe you get both data and headers mixed up. In any case, your approach is what actually forces multiple conversions, when the date data is already in a date format – Panagiotis Kanavos Nov 29 '16 at 09:53
  • @konvas: Let me test. Can I just typecast just 1 column in read_excel? I thought I would need to typecast all columns and have them in the right order. That would harm extensibility. But if I could only do `col_types = c(x = "date")` for just the "x" column that would be good. – Courvoisier Nov 29 '16 at 09:55
  • Indeed you can typecast just 1 column - have a look at https://cran.r-project.org/web/packages/readr/vignettes/column-types.html – konvas Nov 29 '16 at 10:25
  • @konvas: I checked. The link you sent was for readr not read_excel. I can't typecast just 1 column in read_excel. – Courvoisier Dec 12 '16 at 13:43
  • @Courvoisier You're right I thought it was handled the same way as in `readr` because it's the same argument name, but `readxl` doesn't work that way. – konvas Dec 12 '16 at 13:58

0 Answers0