0

I'm happy with the way I've written the summary and it works. Sort of. I need to see min, max, and mean of ride lengths originally formatted as hh:m:ss, grouped by member type (member_casual) in my set. Group by is working just fine, the rest not so much. Ride_length in data set is hh:mm:ss format.

    group_by(member_casual) %>%
    summarise(min_ride_length = min(ride_length),
              max_ride_length = max(ride_length),
              mean_ride_length = mean(ride_length))

I tumbled around stack and other places on google trying different hms functions in different areas within my code chunks and got errors every time. The goal is to get the output in hh:mm:ss format with positive numbers somehow.

samtastic
  • 1
  • 1
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Feb 10 '23 at 21:13
  • Please provide enough code so others can better understand or reproduce the problem. – Community Feb 10 '23 at 21:14
  • Sorry guys, I added an image but apparently it didn't take? I'm new to all this. Thanks for your help! – samtastic Feb 10 '23 at 21:18

1 Answers1

0

If ride_length is a character column with format "hh:mm:ss" then the following code should work. It coerces everything into the number of seconds to calculate the summary statistics, then converts back afterward.

  mutate(ride_length_s = sapply(strsplit(ride_length, split = ":"), function(x){
    x <- as.numeric(x)
    x[1]*3600+x[2]*60+x[3]
  })) %>%
  group_by(member_casual) %>%
  summarise(min_ride_length = min(ride_length_s),
            max_ride_length = max(ride_length_s),
            mean_ride_length = mean(ride_length_s)) %>%
  mutate(across(ends_with("_ride_length"), function(x){
    hrs <- floor(x/3600)
    min <- floor(x%%3600/60)
    sec <- floor(x%%3600%%60)
    paste(hrs, sprintf(min, fmt="%02d"), sprintf(sec, fmt="%02d"), sep = ":")
  }))

Using the example data set of

data.frame(ride_length=c("12:34:56", "23:12:54", "06:17:23", "5:45:45"),
           member_casual=c("A", "A", "B", "B"))

we get

# A tibble: 2 × 4
  member_casual min_ride_length max_ride_length mean_ride_length
  <chr>         <chr>           <chr>           <chr>           
1 A             12:34:56        23:12:54        17:53:55        
2 B             5:45:45         6:17:23         6:01:34              
Dubukay
  • 1,764
  • 1
  • 8
  • 13
  • thank you for giving this a go! I copied and pasted what you said to see if it would work. Got several errors. Error in `mutate()`: ℹ In argument: `ride_length_s = sapply(...)`. Caused by error in `strsplit()`: ! non-character argument Run `rlang::last_error()` to see where the error occurred. – samtastic Feb 10 '23 at 21:54
  • Okay, then we'll need a reproducible example. Can you edit your question to include the output from `dput(head(whatever_your_data_frame_is_called))`? – Dubukay Feb 10 '23 at 22:01