Splitting a column into date/time in tidyverse

Question

I acknowledge that there are multiple similar questions, but I did not find so far an answer, which works for me, maybe because of AM/PM. I would like not to remove the latter.

I have a column Trip Start Timestamp, which looks like

12/01/2019 12:30:00 AM
12/01/2019 12:31:00 AM
12/01/2019 12:32:00 AM

I am trying to remove AM/PM and split into two variables Start date and Start time.

Ideal output:

Start date Start time
12/01/2019 12:30:00 AM

I would like to read it as time series.

My best guess so far

Date <- format(as.POSIXct(strptime(taxi_2020$`Trip Start Timestamp`, "%d/%m/%Y %H:%M:S")), format = "%m/%d/%Y")
Time <- format(as.POSIXct(strptime(taxi_2020$`Trip Start Timestamp`, "%d/%m/%Y %H:%M:S")), format = "%H:%M:S")
head(Date)
head(Time)

Gives me

[1] NA NA NA NA NA NA
[1] NA NA NA NA NA NA

Update

It looks like columns with time and date has some problems with format.

dput can be found here

So far the solution by @Ronak Shah works. Technically I finally separated the data, but probably identified another problem with UTF-8

Ian Campbell · Accepted Answer · 2020-06-16T03:09:27.873

3

Since you tagged this with tidyverse, here's a simple approach with lubridate:

library(dplyr)
library(lubridate)
data %>%
  mutate(Date = as.Date(mdy_hms(`Trip Start Timestamp`)), 
         Time = format(mdy_hms(`Trip Start Timestamp`), "%I:%M:%S %p"))
#    Trip Start Timestamp       Date        Time
#1 12/01/2019 12:30:00 AM 2019-12-01 12:30:00 AM
#2 12/01/2019 12:31:00 AM 2019-12-01 12:31:00 AM
#3 12/01/2019 12:32:00 AM 2019-12-01 12:32:00 AM

Example Data

data <- structure(list(`Trip Start Timestamp` = c("12/01/2019 12:30:00 AM", 
"12/01/2019 12:31:00 AM", "12/01/2019 12:32:00 AM")), class = "data.frame", row.names = c(NA, 
-3L))

edited Jun 16 '20 at 03:09

answered Jun 15 '20 at 23:09

Ian Campbell

23,484
14
36
57

Thanks a lot, you were the first and in tidyverse, so I will accept your answer. – Anakin Skywalker Jun 16 '20 at 02:50
Error: Column `Time` must be length 4137294 (the number of rows) or one, not 3. – Anakin Skywalker Jun 16 '20 at 03:08
the code now works, but the initial column `Trip Start Timestamp` is NA now and new columns also NA. Date as NA and Time as character NA – Anakin Skywalker Jun 16 '20 at 03:13
1

I wonder if there is some sort of factor problem. Can you [edit] your post with the output of `dput(taxi_2020[1:20,])`? It’s really late, so I’ll take a look tomorrow if someone else can’t help you. – Ian Campbell Jun 16 '20 at 03:27
Done, it is too big. I put on google drive. Definitely some formatting problem, I cannot read it in Python too, gives me an error with some characters. Thanks! – Anakin Skywalker Jun 16 '20 at 03:50
Now I have this error - Error in gsub(reg$alpha_exact[["A"]], "%A", x, ignore.case = T, perl = T) : input string 1 is invalid UTF-8 – Anakin Skywalker Jun 16 '20 at 04:28

r2evans · Answer 2 · 2020-06-15T23:23:01.610

tidyverse

dat %>%
  mutate(
    ## option 1
    psx = as.POSIXct(V1, format = "%m/%d/%Y %I:%M:%S %p", tz = "UTC"), 
    Date = format(psx, format = "%m/%d/%Y"), 
    Time = format(psx, format = "%I:%M:%S %p"), 
    ## option 2
    Date2 = sub("\\s.*", "", V1), 
    Time2 = sub("^\\S*\\s", "", V1)
  )
#                        V1                 psx       Date        Time      Date2        Time2
# 1  12/01/2019 12:30:00 AM 2019-12-01 12:30:00 12/01/2019 12:30:00 PM 12/01/2019  12:30:00 AM
# 2  12/01/2019 12:31:00 AM 2019-12-01 12:31:00 12/01/2019 12:31:00 PM 12/01/2019  12:31:00 AM
# 3 12/01/2019 12:32:00 AM  2019-12-01 12:32:00 12/01/2019 12:32:00 PM 12/01/2019 12:32:00 AM

Data

dat <- read.table(header = FALSE, text = "
12/01/2019 12:30:00 AM
12/01/2019 12:31:00 AM
12/01/2019 12:32:00 AM ", sep = "|")

score 2 · Answer 3 · answered Jun 16 '20 at 00:26

2

Note that keeping the output in the format that you have shown would return columns as characters/factors.

You can split the data on whitespace using tidyr::separate

tidyr::separate(data, `Trip Start Timestamp`, c('Start Date', 'Start Time'), 
                       sep = ' ', extra = 'merge')

#  Start Date  Start Time
#1 12/01/2019 12:30:00 AM
#2 12/01/2019 12:31:00 AM
#3 12/01/2019 12:32:00 AM

Similarly, you can use extract :

tidyr::extract(data, `Trip Start Timestamp`, c('Start Date', 'Start Time'), 
               regex = '(.*?)\\s(.*)')

answered Jun 16 '20 at 00:26

Ronak Shah

377,200
20
156
213

But the first code returns columns as characters will all NAs. Any idea why? – Anakin Skywalker Jun 16 '20 at 03:02
1

@Rookie Your data must be different then. I am using Ian's data. Can you test it on that data? – Ronak Shah Jun 16 '20 at 03:05
Does not work, unfortunately. Error: Column `Time` must be length 4137294 (the number of rows) or one, not 3 – Anakin Skywalker Jun 16 '20 at 03:07
1

@Rookie In your `dput` I think the relevant part is only `structure(list(\`Trip Start Timestamp\` = c("12/01/2019 12:15:00 AM", "12/01/2019 12:15:00 AM", "12/01/2019 12:15:00 AM"), \`Trip End Timestamp\` = c("12/01/2019 12:30:00 AM", "12/01/2019 12:15:00 AM", "12/01/2019 12:30:00 AM")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))` . Rest all doesn't get picked up ? And for this data my answer works fine for me. – Ronak Shah Jun 16 '20 at 04:05
apologize! Update to 20 rows – Anakin Skywalker Jun 16 '20 at 04:26
I get this error, using Ian's code now - Error in gsub(reg$alpha_exact[["A"]], "%A", x, ignore.case = T, perl = T) : input string 1 is invalid UTF-8 – Anakin Skywalker Jun 16 '20 at 04:28
1

@Rookie My answer still works fine on your updated data for me. Does it work for you? – Ronak Shah Jun 16 '20 at 04:30
Yes, it did, I finally managed to split the data! But it looks like I have another problem with UTF-8. See the updated question. I will dig into it! Appreciate your help. Not sure why tidyverse solutions failed. – Anakin Skywalker Jun 16 '20 at 04:39

score 2 · Answer 4 · answered Jun 16 '20 at 00:34

2

We can use base R to split

out <- do.call(rbind.data.frame, strsplit(data[[1]],
        "(?<=[0-9]) (?=[0-9])", perl = TRUE))
names(out) <- c('Start Date', 'Start Time')
out
#  Start Date  Start Time
#1 12/01/2019 12:30:00 AM
#2 12/01/2019 12:31:00 AM
#3 12/01/2019 12:32:00 AM

data

data <- structure(list(`Trip Start Timestamp` = c("12/01/2019 12:30:00 AM", 
"12/01/2019 12:31:00 AM", "12/01/2019 12:32:00 AM")), class = "data.frame", row.names = c(NA, 
-3L))

answered Jun 16 '20 at 00:34

akrun

874,273
37
540
662

But it gives me Error in data[[1]] : object of type 'closure' is not subsettable – Anakin Skywalker Jun 16 '20 at 03:04
1

@Rookie Can you check the `str(yourdata)`. I assume it is a `data.frame` or `tibble` as it should work with those structures – akrun Jun 16 '20 at 18:29
tibble [4,137,294 x 21] (S3: tbl_df/tbl/data.frame) $ Start Date : chr [1:4137294] "12/01/2019" "12/01/2019" "12/01/2019" "12/01/2019" ... $ Start Time : chr [1:4137294] "12:15:00 AM" "12:15:00 AM" "12:15:00 AM" "12:15:00 AM" ... $ End Date : chr [1:4137294] "12/01/2019" "12/01/2019" "12/01/2019" "12/01/2019" ... $ End Time : chr [1:4137294] "12:30:00 AM" "12:15:00 AM" "12:30:00 AM" "12:30:00 AM" ... – Anakin Skywalker Jun 16 '20 at 18:38
@Rookie Are you splitting on "Start Time"? In that case use `strsplit(data[["Start Time"]], ...` – akrun Jun 16 '20 at 18:41
@Rookie Based on the image, The "Trip End Timestamp" is 3rd column. So, you may need `strsplit(taxi_2020_test[["Trip End Timestamp"]], ...` – akrun Jun 16 '20 at 18:42
Sorry, did not get what you mean, I separated into Start Date and Start Time already as Ronak suggested above – Anakin Skywalker Jun 16 '20 at 18:43
@Rookie I meant that the `strsplit(data[[1]]` is based on the assumption that it is the first column, if it is not, then change it to the index of the column or the actual column name – akrun Jun 16 '20 at 18:44

Splitting a column into date/time in tidyverse

4 Answers4

tidyverse

data

Linked