1

I have a dataframe 'df1' that has a posixct column 'Date' and other data columns with values associated with dates and times in 'Date'. The date look like this:

Date Value1 Value2
2022-03-14 13:00:00 1 3
2022-03-14 13:10:00 2 4
2022-03-14 13:20:00 3 5
2022-03-14 13:30:00 4 6
2022-03-14 13:40:00 5 7
2022-03-14 13:50:00 6 8
2022-03-14 14:00:00 10 40
2022-03-14 14:10:00 20 50
2022-03-14 14:20:00 30 60
2022-03-14 14:30:00 40 70
2022-03-14 14:40:00 50 80
2022-03-14 14:50:00 60 90

I would like to average the values in 'Value1' and 'Value2' over all instances of each days hour and create a new dataframe 'df2' where 'Date' is now the start of each hour for each day and 'Value1' is the average values. The resulting df2 would look like:

Date Value1 Value2
2022-03-14 13:00:00 3.5 5.5
2022-03-14 14:00:00 35 65
Jacob
  • 329
  • 2
  • 10

2 Answers2

2

You can use floor_date() from lubridate to round the date-time object down to the nearest boundary of hours.

library(dplyr)
library(lubridate)

df %>%
  group_by(Date = floor_date(Date, "hour")) %>%
  summarise(across(contains("Value"), mean))

# # A tibble: 2 × 3
#   Date                Value1 Value2
#   <dttm>               <dbl>  <dbl>
# 1 2022-03-14 13:00:00    3.5    5.5
# 2 2022-03-14 14:00:00   35     65

Data
df <- read.csv(text = "Date, Value1, Value2
2022-03-14 13:00:00, 1, 3
2022-03-14 13:10:00, 2, 4
2022-03-14 13:20:00, 3, 5
2022-03-14 13:30:00, 4, 6
2022-03-14 13:40:00, 5, 7
2022-03-14 13:50:00, 6, 8
2022-03-14 14:00:00, 10, 40
2022-03-14 14:10:00, 20, 50
2022-03-14 14:20:00, 30, 60
2022-03-14 14:30:00, 40, 70
2022-03-14 14:40:00, 50, 80
2022-03-14 14:50:00, 60, 90", colClasses = c(Date = "POSIXct"))
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
  • Thank you so much! Is there a way to summarise my all columns so that you don't have to rely on column names? – Jacob Jan 12 '23 at 18:16
  • 1
    @Jacob you can try different "tidy selection" syntax in `across()`. If you want to summarize all columns except the grouping variables, try `across(everything(), mean)`. Another choice can be `across(where(is.numeric), mean)` that summarizes all numeric columns. – Darren Tsai Jan 12 '23 at 18:27
  • 1
    That's a great solution in case all columns don't have a common str in their name. Thank you! – Jacob Jan 12 '23 at 18:29
1

You can use substr if you previously parse your Date to char:

df$Date <- as.character(df$Date)
substr(df$Date, 15,16) <- "00"
df$Date <- as.POSIXct(df$Date)
df %>% group_by(Date) %>% summarise(Value1 = mean(Value1), Value2 = mean(Value2))

Output:

A tibble: 2 × 3
 Date                Value1 Value2
 <dttm>                <dbl>  <dbl>
1 2022-03-14 13:00:00    3.5    5.5
2 2022-03-14 14:00:00   35     65 

Data:

    df <- data.frame(
  Date = as.POSIXct(c("2022-03-14 13:00:00",
           "2022-03-14 13:10:00","2022-03-14 13:20:00","2022-03-14 13:30:00",
           "2022-03-14 13:40:00","2022-03-14 13:50:00","2022-03-14 14:00:00",
           "2022-03-14 14:10:00","2022-03-14 14:20:00","2022-03-14 14:30:00",
           "2022-03-14 14:40:00","2022-03-14 14:50:00")),
  Value1 = c(1L, 2L, 3L, 4L, 5L, 6L, 10L, 20L, 30L, 40L, 50L, 60L),
  Value2 = c(3L, 4L, 5L, 6L, 7L, 8L, 40L, 50L, 60L, 70L, 80L, 90L)
)
juanbarq
  • 374
  • 6
  • 1
    Theoretically, `Date` should be a `POSIXct` object, not `character`. The line `substr(df$Date, 15,16) <- "00"` will break down and show *"Error: replacing substrings in a non-character object"*. You can use the data in my answer to reproduce the error. – Darren Tsai Jan 12 '23 at 17:41
  • 1
    You are right @Darren Tsai, I edit my answer to a POSIXct Date solution! – juanbarq Jan 12 '23 at 17:46
  • Thank you for your efforts! I accepted the other answer because it was first, but this answer was also helpful. – Jacob Jan 12 '23 at 18:16