R Average values of posixct rows by hour

Question

I have a dataframe 'df1' that has a posixct column 'Date' and other data columns with values associated with dates and times in 'Date'. The date look like this:

Date	Value1	Value2
2022-03-14 13:00:00	1	3
2022-03-14 13:10:00	2	4
2022-03-14 13:20:00	3	5
2022-03-14 13:30:00	4	6
2022-03-14 13:40:00	5	7
2022-03-14 13:50:00	6	8
2022-03-14 14:00:00	10	40
2022-03-14 14:10:00	20	50
2022-03-14 14:20:00	30	60
2022-03-14 14:30:00	40	70
2022-03-14 14:40:00	50	80
2022-03-14 14:50:00	60	90

I would like to average the values in 'Value1' and 'Value2' over all instances of each days hour and create a new dataframe 'df2' where 'Date' is now the start of each hour for each day and 'Value1' is the average values. The resulting df2 would look like:

Date	Value1	Value2
2022-03-14 13:00:00	3.5	5.5
2022-03-14 14:00:00	35	65

Darren Tsai · Accepted Answer · 2023-01-12T17:37:44.550

2

You can use floor_date() from lubridate to round the date-time object down to the nearest boundary of hours.

library(dplyr)
library(lubridate)

df %>%
  group_by(Date = floor_date(Date, "hour")) %>%
  summarise(across(contains("Value"), mean))

# # A tibble: 2 × 3
#   Date                Value1 Value2
#   <dttm>               <dbl>  <dbl>
# 1 2022-03-14 13:00:00    3.5    5.5
# 2 2022-03-14 14:00:00   35     65

Data

df <- read.csv(text = "Date, Value1, Value2
2022-03-14 13:00:00, 1, 3
2022-03-14 13:10:00, 2, 4
2022-03-14 13:20:00, 3, 5
2022-03-14 13:30:00, 4, 6
2022-03-14 13:40:00, 5, 7
2022-03-14 13:50:00, 6, 8
2022-03-14 14:00:00, 10, 40
2022-03-14 14:10:00, 20, 50
2022-03-14 14:20:00, 30, 60
2022-03-14 14:30:00, 40, 70
2022-03-14 14:40:00, 50, 80
2022-03-14 14:50:00, 60, 90", colClasses = c(Date = "POSIXct"))

edited Jan 12 '23 at 17:37

answered Jan 12 '23 at 17:27

Darren Tsai

32,117
5
21
51

Thank you so much! Is there a way to summarise my all columns so that you don't have to rely on column names? – Jacob Jan 12 '23 at 18:16
1

@Jacob you can try different "tidy selection" syntax in `across()`. If you want to summarize all columns except the grouping variables, try `across(everything(), mean)`. Another choice can be `across(where(is.numeric), mean)` that summarizes all numeric columns. – Darren Tsai Jan 12 '23 at 18:27
1

That's a great solution in case all columns don't have a common str in their name. Thank you! – Jacob Jan 12 '23 at 18:29

juanbarq · Answer 2 · 2023-01-12T17:54:31.653

1

You can use substr if you previously parse your Date to char:

df$Date <- as.character(df$Date)
substr(df$Date, 15,16) <- "00"
df$Date <- as.POSIXct(df$Date)
df %>% group_by(Date) %>% summarise(Value1 = mean(Value1), Value2 = mean(Value2))

Output:

A tibble: 2 × 3
 Date                Value1 Value2
 <dttm>                <dbl>  <dbl>
1 2022-03-14 13:00:00    3.5    5.5
2 2022-03-14 14:00:00   35     65

Data:

    df <- data.frame(
  Date = as.POSIXct(c("2022-03-14 13:00:00",
           "2022-03-14 13:10:00","2022-03-14 13:20:00","2022-03-14 13:30:00",
           "2022-03-14 13:40:00","2022-03-14 13:50:00","2022-03-14 14:00:00",
           "2022-03-14 14:10:00","2022-03-14 14:20:00","2022-03-14 14:30:00",
           "2022-03-14 14:40:00","2022-03-14 14:50:00")),
  Value1 = c(1L, 2L, 3L, 4L, 5L, 6L, 10L, 20L, 30L, 40L, 50L, 60L),
  Value2 = c(3L, 4L, 5L, 6L, 7L, 8L, 40L, 50L, 60L, 70L, 80L, 90L)
)

edited Jan 12 '23 at 17:54

answered Jan 12 '23 at 17:31

juanbarq

374
6

1

Theoretically, `Date` should be a `POSIXct` object, not `character`. The line `substr(df$Date, 15,16) <- "00"` will break down and show *"Error: replacing substrings in a non-character object"*. You can use the data in my answer to reproduce the error. – Darren Tsai Jan 12 '23 at 17:41
1

You are right @Darren Tsai, I edit my answer to a POSIXct Date solution! – juanbarq Jan 12 '23 at 17:46
Thank you for your efforts! I accepted the other answer because it was first, but this answer was also helpful. – Jacob Jan 12 '23 at 18:16

R Average values of posixct rows by hour

2 Answers2

Data