7

I am working on "Localization Data for Person Activity Data Set" dataset from UCI and in this data set there is a column of date and time(both in one column) with following format:

27.05.2009 14:03:25:777
27.05.2009 14:03:25:183
27.05.2009 14:03:25:210
27.05.2009 14:03:25:237
...

I am wondering if there is anyway to convert this column to timestamp using R.

rcs
  • 67,191
  • 22
  • 172
  • 153
Soroosh
  • 477
  • 2
  • 7
  • 18

1 Answers1

16

First of all, we need to substitute the colon separating the milliseconds from the seconds to a dot, otherwise the final step won't work (thanks to Dirk Eddelbuettel for this one). Since in the end R will use the separators it wants, to be quicker, I'll just go ahead and substitute all the colons for dots:

x <- "27.05.2009 14:03:25:777"  # this is a simplified version of your data
y <- gsub(":", ".", x)          # this is your vector with the aforementioned substitution

By the way, this is how your vector should look after gsub:

> y
[1] "27.05.2009 14.03.25.777"

Now, in order to have it show the milliseconds, you first need to adjust an R option and then use a function called strptime, which will convert your date vector to POSIXlt (an R-friendly) format. Just do the following:

> options(digits.secs = 3)           # this tells R you want it to consider 3 digits for seconds.
> strptime(y, "%d.%m.%Y %H:%M:%OS")  # this finally formats your vector
[1] "2009-05-27 14:03:25.777"

I've learned this nice trick here. This other answer also says you can skip the options setting and use, for example, strptime(y, "%d.%m.%Y %H:%M:%OS3"), but it doesn't work for me. Henrik noted that the function's help page, ?strptime states that the %OS3 bit is OS-dependent. I'm using an updated Ubuntu 13.04 and using %OS3 yields NA.

When using strptime (or other POSIX-related functions such as as.Date), keep in mind some of the most common conversions used (edited for brevity, as suggested by DWin. Complete list at strptime):

  • %a Abbreviated weekday name in the current locale.
  • %A Full weekday name in the current locale.
  • %b Abbreviated month name in the current locale.
  • %B Full month name in the current locale.
  • %d Day of the month as decimal number (01–31).
  • %H Hours as decimal number (00–23). Times such as 24:00:00 are accepted for input.
  • %I Hours as decimal number (01–12).
  • %j Day of year as decimal number (001–366).
  • %m Month as decimal number (01–12).
  • %M Minute as decimal number (00–59).
  • %p AM/PM indicator in the locale. Used in conjunction with %I and not with %H.
  • `%S Second as decimal number (00–61), allowing for up to two leap-seconds (but POSIX-compliant implementations will ignore leap seconds).
  • %U Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention.
  • %w Weekday as decimal number (0–6, Sunday is 0).
  • %W Week of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention.
  • %y Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19
  • %Y Year with century. Note that whereas there was no zero in the original Gregorian calendar, ISO 8601:2004 defines it to be valid (interpreted as 1BC)
Community
  • 1
  • 1
Waldir Leoncio
  • 10,853
  • 19
  • 77
  • 107
  • 4
    a) You don't need the long format for `as.Date()`. b) wrap `as.POSIXct()` around `strptime()` and you get the more compact POSIXct. c) You missed the subseconds. That requires a `gsub()` or something else as the format requires a dot; the colon will not work. – Dirk Eddelbuettel Sep 28 '13 at 01:50
  • @DirkEddelbuettel, thanks for the heads up, I was wondering if those were really subseconds or what. Honestly, I don't know how to handle them. Would you show us (maybe by editing the post)? – Waldir Leoncio Sep 28 '13 at 01:56
  • 3
    I think posting the entire table of conversion formats from the help page is a waste of space. Better to just link to a page where someone can find it or just `?strptime`. – IRTFM Sep 28 '13 at 02:05
  • @DWin, I've though about just inserting the ones I thought would be more useful, but you never know when a link might break. Feel free to propose an edit, though. – Waldir Leoncio Sep 28 '13 at 02:10
  • 2
    I did suggest alternatives. If `?strptime` fails, then the R installation is broken. – IRTFM Sep 28 '13 at 02:18
  • 1
    Hey guys thank you all for your answers. Actually my main problems are (1) those last 3 digit number(millisecond) and (2) date format which is date.month.year. I don't know how to handle them before use strptime function! – Soroosh Sep 28 '13 at 06:15
  • @Soroosh, please see edited answer, I think it might resolve your issue now. – Waldir Leoncio Sep 28 '13 at 08:57
  • @DWin, I was also trying to account for the laziness of the reader to click on a link, but since I was editing the answer, I went ahead and shortened the list like you suggested. – Waldir Leoncio Sep 28 '13 at 09:00
  • 3
    Regarding "`strptime(y, "%d.%m.%Y %H:%M:%OS3")`, but it doesn't work for me", see also the comment to the example in `?strptime`: `## time to sub-second accuracy (**if supported by the OS**); format(Sys.time(), "%H:%M:%OS3")`. – Henrik Sep 28 '13 at 09:06
  • @Henrik, good eye, I'll update the answer. – Waldir Leoncio Sep 28 '13 at 09:09