62

I'm currently working in tick data with R and I would like to merge date and time into a single object as I need to get a precise time object to compute some statistics on my data. Here is how my data looks like:

               date       time      price flag    exchange
2   XXH10   2010-02-02   08:00:03   2787 1824        E
3   XXH10   2010-02-02   08:00:04   2786    3        E
4   XXH10   2010-02-02   08:00:04   2787    6        E
5   XXH10   2010-02-02   08:00:04   2787    1        E
6   XXH10   2010-02-02   08:00:04   2787    1        E

Basically, I would like to merge the columns "date" and "time" into a single one.

Eric Platon
  • 9,819
  • 6
  • 41
  • 48
marino89
  • 899
  • 1
  • 10
  • 16
  • If you specify which statistics, someone might come up with a better solution than merging date and time. – Ryogi Jul 23 '12 at 09:08

2 Answers2

89

Create a datetime object with as.POSIXct:

as.POSIXct(paste(x$date, x$time), format="%Y-%m-%d %H:%M:%S")
[1] "2010-02-02 08:00:03 GMT" "2010-02-02 08:00:04 GMT" "2010-02-02 08:00:04 GMT"
[4] "2010-02-02 08:00:04 GMT" "2010-02-02 08:00:04 GMT"
Andrie
  • 176,377
  • 47
  • 447
  • 496
16

Of course, more elegant solution (arguably) is possible with extra package. When working with dates it's lubridate package:

library(lubridate)

with(x, ymd(date) + hms(time))

should produce POSIXlt vector.

UPDATE:

There is another solution using general purpose date and time conversion package anytime (based on C++ library Boost date_time):

 library(anytime)

 with(x, anytime(paste(date, time)))

Indeed, comparing anytime with both base R and lubridate (deservedly considered rather slow - see Why are my functions on lubridate dates so slow?) C++ (anytime) wins:

 x = read.csv(text = 'date,time
2010-02-02,08:00:03
2010-02-02,08:00:04
2010-02-02,08:00:04
2010-02-03,08:00:04
2010-02-04,08:00:05
2010-02-04,08:00:05
2010-02-04,08:00:06
2010-02-04,08:00:07
2010-02-04,08:00:08
2010-02-04,08:00:14')

 microbenchmark::microbenchmark(
   base = with(x, as.POSIXct(paste(date, time), format="%Y-%m-%d %H:%M:%S")),
   anytime = with(x, anytime::anytime(paste(date, time))),
   lubri = with(x, lubridate::ymd(date) + lubridate::hms(time)),
   times = 1000L
)
Unit: microseconds
  expr      min        lq       mean   median        uq        max  neval
 base       71.163   91.2555   104.38747  104.785  112.1185   256.997  1000
 anytime    40.508   52.5385   63.46973   61.843   68.5730    221.076  1000
 lubri      1596.490 1850.4400 2235.34254 1909.588 2033.096   110751.622  1000
topchef
  • 19,091
  • 9
  • 63
  • 102
  • 6
    From my experience, `ymd` and `hms` are way slower than other options where you provide the format manually. So when working with large data, I would recommend `as.POSIXct` (or `fasttime::fastPOSIXct`). – der_grund Sep 27 '17 at 12:45
  • added `anytime` solution with benchmarks that show `anytime` runs the fastest. – topchef Nov 02 '20 at 22:51