1

I am now working on a dataset that uses a date time (MM/DD/YYYY HH:00:00 A/PM) format to save timeframe, I would like to look into insight in the intervals in hours, but after using separate, the AM/PM part of the date had disappeared, how can I code to make the A/PM remain?

My code is as below:

hourlyCalories_merged %>%
  separate(ActivityHour, c("ActivityDay", "ActivityHour_sep"), sep=" ")

and it transformed the data from

Activity Hour - MM/DD/YYYY HH:00:00 A/PM

to

Activity Day - MM/DD/YYYY and ActivityHour_sep - HH:00:00

only.

while my expected output of ActivityHour_sep is HH:00:00 A/PM

jay.sf
  • 60,139
  • 8
  • 53
  • 110
Brian Mak
  • 15
  • 4
  • 2
    Please share a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) or [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) with an example input and your expected output. – Martin Gal Apr 09 '23 at 22:42

1 Answers1

1

Since there are two spaces in the time stamp, you may want to split by "\\s(?=\\d)", space followed by a digit, in order to get two pieces instead of three.

df |>
  tidyr::separate(ActivityHour, c("ActivityDay"," ActivityHour_sep"), sep="\\s(?=\\d)")
#    ActivityDay  ActivityHour_sep         X
# 1   04/10/2023       12:32:25 AM 0.9148060
# 2   04/10/2023       12:32:26 AM 0.9370754
# 3   04/10/2023       12:32:27 AM 0.2861395
# 4   04/10/2023       12:32:28 AM 0.8304476
# 5   04/10/2023       12:32:29 AM 0.6417455
# 6   04/10/2023       12:32:30 AM 0.5190959
# 7   04/10/2023       12:32:31 AM 0.7365883
# 8   04/10/2023       12:32:32 AM 0.1346666
# 9   04/10/2023       12:32:33 AM 0.6569923
# 10  04/10/2023       12:32:34 AM 0.7050648
# 11  04/10/2023       12:32:35 AM 0.4577418

Base R solution (about 20% faster):

do.call(rbind.data.frame, strsplit(df$ActivityHour, "\\s(?=\\d)", perl=TRUE)) |> 
  setNames(c("ActivityDay"," ActivityHour_sep")) |>
  cbind(df[setdiff(names(df), 'ActivityHour')])
#    ActivityDay  ActivityHour_sep         X
# 1   04/10/2023       12:32:25 AM 0.9148060
# 2   04/10/2023       12:32:26 AM 0.9370754
# 3   04/10/2023       12:32:27 AM 0.2861395
# 4   04/10/2023       12:32:28 AM 0.8304476
# 5   04/10/2023       12:32:29 AM 0.6417455
# 6   04/10/2023       12:32:30 AM 0.5190959
# 7   04/10/2023       12:32:31 AM 0.7365883
# 8   04/10/2023       12:32:32 AM 0.1346666
# 9   04/10/2023       12:32:33 AM 0.6569923
# 10  04/10/2023       12:32:34 AM 0.7050648
# 11  04/10/2023       12:32:35 AM 0.4577418

However, it might be better to use the POSIX format instead of time in character format so that you can perform time calculations. You can use strptime for that.

df <- transform(df, ActivityHour=strptime(ActivityHour, '%m/%d/%Y %H:%M:%S %p', tz='GMT'))
df
#           ActivityHour         X
# 1  2023-04-10 00:32:25 0.9148060
# 2  2023-04-10 00:32:26 0.9370754
# 3  2023-04-10 00:32:27 0.2861395
# 4  2023-04-10 00:32:28 0.8304476
# 5  2023-04-10 00:32:29 0.6417455
# 6  2023-04-10 00:32:30 0.5190959
# 7  2023-04-10 00:32:31 0.7365883
# 8  2023-04-10 00:32:32 0.1346666
# 9  2023-04-10 00:32:33 0.6569923
# 10 2023-04-10 00:32:34 0.7050648
# 11 2023-04-10 00:32:35 0.4577418

where

class(df$ActivityHour)
# [1] "POSIXlt" "POSIXt" 

Data:

df <- structure(list(ActivityHour = c("04/10/2023 12:32:25 AM", "04/10/2023 12:32:26 AM", 
"04/10/2023 12:32:27 AM", "04/10/2023 12:32:28 AM", "04/10/2023 12:32:29 AM", 
"04/10/2023 12:32:30 AM", "04/10/2023 12:32:31 AM", "04/10/2023 12:32:32 AM", 
"04/10/2023 12:32:33 AM", "04/10/2023 12:32:34 AM", "04/10/2023 12:32:35 AM"
), X = c(0.914806043496355, 0.937075413297862, 0.286139534786344, 
0.830447626067325, 0.641745518893003, 0.519095949130133, 0.736588314641267, 
0.13466659723781, 0.656992290401831, 0.705064784036949, 0.45774177624844
)), row.names = c(NA, -11L), class = "data.frame")
jay.sf
  • 60,139
  • 8
  • 53
  • 110