-2

I'm brand new to R/RStudio. I have a data frame that has two date-times, started_at and ended_at. I'd like to split both into separate date and time columns. I have figured out how to split one or the other, but I'm struggling to figure out both, other than to split one, create a new data frame, then split the other. Is there an easier/faster way? I'm currently using mutate and separate.

Thanks so much for any help or advice!

trip_data %>% mutate(started_at = ymd_hms(started_at)) %>%
  separate(started_at, into = c("start_date", "start_time"), sep = " ", remove = FALSE)

I hope I'm doing this right, and apologize if I'm not. This is my reproducible example. I have three columns, assume thousands of rows, I need to separate two date-time columns but not the other. Thank you hugh-allan for the start here. Alternatively, as hugh-allan suggested, I may only need to pull the ymd out into a separate column for both for calculating intervals.

tibble(
  start = '2021-09-18 16:45:32', 
  end = '2021-09-18 16:50:15',
  name = 'trip'
) %>% 
  {. ->> my_data}
  • 1
    It would be easier to help if you create a small reproducible example along with expected output. Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah Sep 18 '21 at 02:58

2 Answers2

0

If we want to loop over multiple columns using separate use a for loop

library(dplyr)
library(tidyr)
library(stringr)
for(nm in c('started_at', 'ended_at')) {
  trip_data <- trip_data %>%
     separate(!! rlang::sym(nm), into = str_c(str_remove(nm, '_at'), 
       c('_date', '_time')), sep = " ", remove = FALSE)
}

data

trip_data <- structure(list(started_at = "2021-09-17 20:51:59", ended_at = "2021-09-17 20:52:04"), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame"))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

When working with dates and datetimes in R, the lubridate package is your friend. It has many useful functions, and has functions specifically designed to return the date and time components of a datetime.

I highly recommend reading and working through the R for Data Science ebook if you are new to R, and especially Chapter 16 - Dates and times for a good walk through of date and time handling using the lubridate package.

For your example where you have only two columns you would like to extract dates and times for, this can be done relatively easily by repeating the same code for each of the columns. If you had many columns, a for loop as suggested by @akrun might be the go, or you could transform your data from wide to long format.

Firstly, we make some sample data

tibble(
  start = '2021-09-18 16:45:32', 
  end = '2021-09-18 16:50:15'
) %>% 
  {. ->> my_data}

my_data 

# # A tibble: 1 x 2
# start               end                
# <chr>               <chr>              
# 2021-09-18 16:45:32 2021-09-18 16:50:15

So far, the datetimes are in character format (as seen by the <chr> under the column names in the tibble preview). So, next we convert them into 'proper' datetime format, using lubridate::ymd_hms(). ymd_hms() takes a character format datetime arranged in 'YYYY-MM-DD HH:MM:SS' style, and converts it to a datetime dttm format in R.

my_data %>% 
  mutate_all(ymd_hms) %>% 
  {. ->> my_data_2}

my_data_2

# # A tibble: 1 x 2
# start               end                
# <dttm>              <dttm>             
# 2021-09-18 16:45:32 2021-09-18 16:50:15

Then, we can use lubridate::date() to pull out just the 'date' component of the datetime, and hms::as_hms to pull out the 'time' component (see this answer explaining hms::as_hms).

my_data_2 %>% 
  mutate(
    start_date = date(start), 
    start_time = hms::as_hms(start), 
    end_date = date(end), 
    end_time = hms::as_hms(end)
  ) %>% 
  {. ->> my_data_3}

my_data_3

# # A tibble: 1 x 6
# start               end                 start_date start_time end_date   end_time
# <dttm>              <dttm>              <date>     <time>     <date>     <time>  
# 2021-09-18 16:45:32 2021-09-18 16:50:15 2021-09-18 16:45:32   2021-09-18 16:50:15

Depending on exactly what you want to do, it may be wise to store your 'times' as datetimes rather than just the time component. This makes calculation of durations or time differences easier, especially if the time periods extend over multiple dates.

Additionally, lubridate can also pull out individual time components such as hour, minute and second, if that is required. For example, finding these components of the start column only:

my_data %>% 
  select(start) %>% 
  mutate(
    start_hour = hour(start), 
    start_min = minute(start),
    start_sec = second(start)
  )

# # A tibble: 1 x 4
# start               start_hour start_min start_sec
# <chr>                    <int>     <int>     <dbl>
# 2021-09-18 16:45:32         16        45        32
hugh-allan
  • 1,170
  • 5
  • 16