To calculate time differences, is there a way to change all : in a data frame?

Question

I was wondering if there is a way to make all colons in a data frame a period? Or delete them from a data set all together?

I have asked people what time they went to sleep and they've put 21:00 for example. But now when it comes to minusing my data for finding how long they were asleep for, i dont think r likes the colon. Is there anyway to solve this and eliminate all :'s from the data set?

Any help would be greatly appreciated Thank you

I have thought about using the code:

data %>% dplyr::mutate(., QS1 = dplyr::recode(QS1, "01:00" = "0100"))

but it would take me ages to individually go through and do this to over 300 pieces of data

Andre Wildberg · Answer 1 · 2023-03-19T00:34:42.127

EDIT: To achieve the goal of removing the colon use the below code, but note that doing math on time stamps is perfectly fine in R, e.g. as.POSIXct(data$QS1, format="%H:%M") + 20. For a more detailed approach more info about your real data is necessary. For example did you also record date information or is it just an hour:minute data set etc.

An approach using str_replace

library(dplyr)
library(stringr)

data %>% 
  mutate(QS1_new = str_replace(QS1, ":", ""))
    QS1 QS1_new
1 00:00    0000
2 00:00    0000
3 00:00    0000
4 00:00    0000
5 00:00    0000

Or with base R

data$QS1_new <- sub(":", "", data$QS1)

dandrews · Answer 2 · 2023-03-18T19:01:24.440

It may be useful for you to look into datetime math since direct conversion across midnight could wreak havoc, or since 2130 - 2030 = 100 which is not equal to 1hr (how long they slept). There's a couple packages useful for this. Here's a quick example.

install.packages('lubridate')
install.packages('hms')
library(lubridate)
library(hms)

df <- tibble(time1=seq.POSIXt(as_datetime('03-23-23 21:00',format='%m-%d-%y %H:%M'), # using lubridate::datetime fxn
                        as_datetime('03-24-23 08:00',format='%m-%d-%y %H:%M'),
                        by='hours'),
             time2=seq.POSIXt(strptime('03-23-23 22:00',format='%m-%d-%y %H:%M'), # using base::strptime
                              strptime('03-24-23 09:00',format='%m-%d-%y %H:%M'),
                              by='hours'))

df %>% mutate(timediff=as.hms(difftime(time2,time1))) # math to get difference in hours w/ hms::as_hms()
``

Len Greski · Answer 3 · 2023-03-18T19:31:17.267

The original question asks how to replace colons in time values with periods so differences can be calculated. Unfortunately, since there are only 60 minutes in an hour, 14:30 (2:30 PM) does not equal 14.30. Therefore, any math operations on the decimal versions of time values is likely to introduce errors.

For example, if someone slept between 6:00 PM and 11:30PM, the time slept is 5.5 hours, not 11.3 - 6.0 = 5.3.

Complicating matters is the fact that if the end times for sleep occur on the following day (i.e. someone goes to sleep at 10PM and wakes up at 6:00AM the next day, the dates are required for the math to be correct.

Fortunately we can do calculate differences of date time values relatively simply with the lubridate package.

Without a minimal reproducible example, we'll create some data that has person names and start / end dates & times for sleeping.

# create a data frame with time data
aFile <- "name,startTime,endTime
Jane,2023-03-01 20:00,2023-03-02 06:00
Joe,2023-03-01 18:00,2023-03-02 04:30
Kuthilda,2023-03-01 19:37,2023-03-02 02:10
Patricia,2023-03-02 06:00,2023-03-02 14:00"

df <- read.csv(text=aFile)

As specified in this example the date time information is read as character strings, so next we'll load the lubridate package and use its parse_date_time() function to calculate the sleep times.

library(lubridate)
df$sleepTime <- parse_date_time(df$endTime,"%Y-%m-%d %H:%M") - parse_date_time(df$startTime,"%Y-%m-%d %H:%M")

Finally, we print the results.

df

> df 
      name        startTime          endTime   sleepTime
1     Jane 2023-03-01 20:00 2023-03-02 06:00 10.00 hours
2      Joe 2023-03-01 18:00 2023-03-02 04:30 10.50 hours
3 Kuthilda 2023-03-01 19:37 2023-03-02 02:10  6.55 hours
4 Patricia 2023-03-02 06:00 2023-03-02 14:00  8.00 hours

jay.sf · Accepted Answer · 2023-03-19T08:12:45.743

You probably have data like this

df
#   ID   QS1   QS2            X
# 1  1 21:00 06:00 -0.008238534
# 2  2 22:00 07:30  0.862295486
# 3  3 21:30 07:00  0.451935629

We only have time. Since somebody could got earlier to bed and woke up the same day we might run into some problems.

Anyway, you can paste a date before your times, usually two consecutive days. Actually, it is not absolutely necessary to use the correct day, the main thing is that it is two consecutive days. Check if there are "unusually" sleepers as mentioned before.

tcols <- c('QS1', 'QS2')
df[tcols] <- Map(paste, c('2023-03-18', '2023-03-19'), df[tcols])

Then all you need is difftime which recognizes time format automatically.

df$dif <- difftime(df$QS2, df$QS1, units='hours')
df
#   ID              QS1              QS2          X       dif
# 1  1 2023-03-18 21:00 2023-03-19 06:00  0.8199990 9.0 hours
# 2  2 2023-03-18 22:00 2023-03-19 07:30 -0.8152736 9.5 hours
# 3  3 2023-03-18 21:30 2023-03-19 07:00  1.1706443 9.5 hours

where

str(df)
# 'data.frame': 3 obs. of  5 variables:
# $ ID : int  1 2 3
# $ QS1: chr  "2023-03-18 21:00" "2023-03-18 22:00" "2023-03-18 21:30"
# $ QS2: chr  "2023-03-19 06:00" "2023-03-19 07:30" "2023-03-19 07:00"
# $ X  : num  0.982 -0.252 -0.76
# $ dif: 'difftime' num  9 9.5 9.5
#  ..- attr(*, "units")= chr "hours"

Note: To convert character time (with date already included) to "POSIXct" time format read this posts: [1] [2]

Data:

df <- structure(list(ID = 1:3, QS1 = c("21:00", "22:00", "21:30"), 
    QS2 = c("06:00", "07:30", "07:00"), X = c(-0.0616202790380149, 
    -1.5162238145615, -0.413276568120249)), class = "data.frame", row.names = c(NA, 
-3L))

To calculate time differences, is there a way to change all : in a data frame?

4 Answers4