1

Apologies in advance if this has already been asked elsewhere, but I've tried different attempts and nothing has worked so far.

I wish to open a big Excel file (> 21000 rows) using :

    myData <- read.xlsx("....xlsx", sheet = 1, colNames = TRUE)

I have two columns with dates that I need to convert in "01-01-2019" format instead of "43000". Therefore I use :

    myData$Begin.Date <- convertToDate(myData$Begin.Date)
    myData$End.Date <- convertToDate(myData$End.Date)

Then, I also have two columns in the Excel file with Times (in hours:min ranging from 00:00 to 23:59). However, when I read the Excel file in R, all the hours are converted in values from 0 to 0,99. How can I convert these two colums to keep the "hours:min" format ? If I use the convertToDate function, the values 00:00 that appear as 0 now will be converted in "1899-12-30"

    myData$Begin.Time <- ??(myData$Begin.Time)
    myData$End.Time <- ??(myData$End.Time)

Here is what I've obtained up to now :

    > myData <- read.xlsx("....xlsx", sheet = 1, colNames = TRUE)
    > myData$Begin.Date <- with(myData, convertToDateTime(Begin.Date))
    > myData$End.Date <- with(myData, convertToDateTime(End.Date))
    > myData$Begin.Time <- with(myData, convertToDateTime(Begin.Time))
    > myData$End.Time <- with(myData, convertToDateTime(End.Time))
    > head(myData, 2)
             xxxxx yyyyy zzzzz aaaaa bbbbb nnnnnn qqqqq ssssss
    1 xxx     yyy  zzz   aaa   yyyyyy tttttt  B  rrr
    2 xxx     yyy  zzz   aaa   yyyyyy tttttt  B  rrr
      kkkkkk mmmmmm ooooo Begin.Date Begin.Time   End.Date   End.Time 
    1 u       yyy     y   2019-01-01 1899-12-30 2019-01-29 1899-12-30      
    2 u       yyy     y   2019-01-01 1899-12-30 2019-01-29 1899-12-30


    dput(head(myData,3))
    structure(list(Sample_ID = c("...", "...", "..."), Locality.Name = c("...", "...", "..."), Code = c("...", "...", "..."), Catchment =  c("...", 
    "...", "..."), Decimal.Latitude = c(..., ..., ...), Decimal.Longitude = c(..., ..., ...), Sample.type = c("...", 
    "...", "..."), Sample.Treatment = c(".", ".", "."), Chemicals = c("...", "...", "..."), Apparatus.Type = c(".", ".", "."), 
    Begin.Date = c(43466, 43466, 43466), Begin.Time = c(0, 0, 0), End.Date = c(43494, 43494, 43494), End.Time = c(0, 0, 0), Value = c(..., ..., .), Value.Type = c("A", "A", "A"), Measuring.Unit = c("...", "...", "..."), Uncertainty.Value = c(..., .., ..),   Uncertainty.Type = c(".", ".","."), Uncertainty.Unit = c("...", "...", "..."), Laboratory = c("...", "...", "..."), class = "data.frame")
Sylvain
  • 133
  • 1
  • 10
  • Per `r` tag (hover or click to see): Please provide minimal and [reproducible example(s)](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5965451) along with the desired output. Use `dput()` for data and specify all non-base packages with `library()` calls. – Parfait Jun 09 '20 at 21:27
  • Ok. see my edit of my question :) – Sylvain Jun 11 '20 at 08:02
  • an example of the input data would be very helpful : dput(head(myData,3)) – Waldi Jun 11 '20 at 08:19
  • See my edit of the question – Sylvain Jun 11 '20 at 09:47
  • Did my answer below not resolve this question advising the use of `convertToDateTime`? What is the remaining issue? Remember R does not have a `Time` type by itself (unlike `Date`). – Parfait Jun 11 '20 at 13:36
  • Indeed. Then, I think the easiest way would be to open myData in a .csv file because in this case, the problem is solved. Thank's for you help in this matter – Sylvain Jun 11 '20 at 15:29

2 Answers2

1

@Parfait got it :

openxlsx::convertToDateTime(myData$Begin.Date+myData$Begin.Time)

for example :

myData <- data.frame(Begin.Date = 43000:43010,
                     Begin.Time = seq(0,1,0.1)) 
openxlsx::convertToDateTime(myData$Begin.Date+myData$Begin.Time)

 [1] "2017-09-22 00:00:00 CEST" "2017-09-23 02:24:00 CEST" "2017-09-24 04:48:00 CEST" "2017-09-25 07:12:00 CEST"
 [5] "2017-09-26 09:36:00 CEST" "2017-09-27 12:00:00 CEST" "2017-09-28 14:24:00 CEST" "2017-09-29 16:48:00 CEST"
 [9] "2017-09-30 19:12:00 CEST" "2017-10-01 21:36:00 CEST" "2017-10-03 00:00:00 CEST"

What openxlsx::convertToDateTime does is to convert Begin.Date to POSIXct taking Excel Origin which is "1899-12-30", multiply the decimal part Begin.Time (between 0 and 1) by the number of seconds in a day (84600), and account for the time zone (because as.Date forces UTC) :

myData <- data.frame(Begin.Date = 43000:43010,
                     Begin.Time = seq(0,1,0.1)) 
as.POSIXct(as.Date(myData$Begin.Date,origin='1899-12-30'))+myData$Begin.Time*86400-3600*2

 [1] "2017-09-22 00:00:00 CEST" "2017-09-23 02:24:00 CEST" "2017-09-24 04:48:00 CEST" "2017-09-25 07:12:00 CEST" "2017-09-26 09:36:00 CEST"
 [6] "2017-09-27 12:00:00 CEST" "2017-09-28 14:24:00 CEST" "2017-09-29 16:48:00 CEST" "2017-09-30 19:12:00 CEST" "2017-10-01 21:36:00 CEST"
[11] "2017-10-03 00:00:00 CEST"
Waldi
  • 39,242
  • 6
  • 30
  • 78
  • is Begin.Time a number or a character? lubridate::seconds_to_period(86400*0.5) gives me "12H 0M 0S", but lubridate::seconds_to_period(86400*"0,5") gives me an error – Waldi Jun 09 '20 at 16:19
  • it is numeric. R accepts it but I do not see any change in the columns, with the values still in the range [0,1] – Sylvain Jun 09 '20 at 16:31
  • Sorry I do not understand why but it still does not work. All the values in the column Begin.Time are still between 0 and 1 – Sylvain Jun 09 '20 at 16:57
  • do you see a difference between your data and the example I updated? – Waldi Jun 09 '20 at 17:39
1

Consider adding the two numeric vectors and use convertToDateTime to render proper date time format in POSIXct type. This works since R does not specifically have a Time type (though it does have Date). But with POSIXct you carry both components and can still run date or time level operations.

myData <- read.xlsx("/path/to/myWorkbook.xlsx", sheet = 1, colNames = TRUE)

myData$Begin.DateTime <- with(myData, convertToDateTime(Begin.Date + Begin.Time))
myData$End.DateTime <- with(myData, convertToDateTime(End.Date + End.Time))

To demonstrate, if Excel worksheet looks like the below:

Spreadsheet Input

With convertToDateTime, R reproduces exact same date/time values:

myData <- read.xlsx("/path/to/myWorkbook.xlsx", sheet = 1, colNames = TRUE)

myData$DateTime <- with(myData, convertToDateTime(DATE + TIME))

head(myData, 10)

    GROUP INT        NUM CHAR  BOOL  DATE       TIME            DateTime
1   stata   8 127.666866  aEi  TRUE 41834 0.45681916 2014-07-14 10:57:49
2  python   5  34.688263  H2o FALSE 36231 0.08267816 1999-03-12 01:59:03
3   julia   1  -3.964953  lsf  TRUE 29694 0.08998882 1981-04-18 02:09:35
4       r   7  86.658479  Qvq FALSE 37560 0.42211337 2002-10-31 10:07:50
5     sas  10 -51.684354  Nwz FALSE 28965 0.68242260 1979-04-20 16:22:41
6    spss   2  57.717068  7mQ  TRUE 29167 0.99936381 1979-11-08 23:59:05
7  python  11 101.840519  Qwa FALSE 29638 0.72010194 1981-02-21 17:16:56
8  python   6  27.396547  viO FALSE 27524 0.02126105 1975-05-10 00:30:36
9   stata   5  36.561838  Z01 FALSE 30932 0.74739582 1984-09-07 17:56:14
10  julia   6 -60.987695  jSP FALSE 32613 0.25558634 1989-04-15 06:08:02

str(myData)

'data.frame':   500 obs. of  8 variables:
$ GROUP   : chr  "stata" "python" "julia" "r" ...
$ INT     : num  8 5 1 7 10 2 11 6 5 6 ...
$ NUM     : num  127.67 34.69 -3.96 86.66 -51.68 ...
$ CHAR    : chr  "aEi" "H2o" "lsf" "Qvq" ...
$ BOOL    : chr  "TRUE" "FALSE" "TRUE" "FALSE" ...
$ DATE    : num  41834 36231 29694 37560 28965 ...
$ TIME    : num  0.4568 0.0827 0.09 0.4221 0.6824 ...
$ DateTime: POSIXct, format: "2014-07-14 10:57:49" "1999-03-12 01:59:03"...
Parfait
  • 104,375
  • 17
  • 94
  • 125