calculating monthly averages (aggregated) in a data frame and plot it in ggplot

Question

I have this data frame with different columns (x1,x2) and I would like to calculate daily averages and monthly averages (aggregated) for those columns and plot them separately. The data frame looks like this. (it is only for 1 year so year aspect does not matter. How can I achieve this?

observed time        hour    day    month   x1       x2   
01/01/2022 00:00       0      1       1      4      7.0 
01/01/2022 01:00       1      1       1      4      6.9
01/01/2022 02:00       2      1       1      5      6.9
01/01/2022 03:00
....
30/11/2022 00:00       0     30      11      5     14.4

You have two questions: how to aggregate, and how to plot. Both are covered fairly copiously in other questions (mostly separately) in StackOverflow. "How to plot" depends a lot on what you're trying to communicate in the plot. Scatter plot, `x2 ~ x1`? Time-series dots/lines? Binned barplots or histograms? I think for your first question, this is a dupe of https://stackoverflow.com/q/11562656/3358272; for how to plot, I think the best advice is to look at https://r-graph-gallery.com and see what you find interesting. — r2evans, Dec 22 '22 at 20:53

score 2 · Answer 1 · answered Dec 22 '22 at 21:11

dat <- data.frame(observed = seq(as.POSIXct("2022-01-01 00:00:00"), as.POSIXct("2022-12-31 23:59:00"), by = "hour"))
nrow(dat)
# [1] 8760

set.seed(42)
dat$x1 <- runif(nrow(dat), 0, 5)
dat$x2 <- runif(nrow(dat), 0, 12)

head(dat)
#              observed       x1        x2
# 1 2022-01-01 00:00:00 4.574030  7.884313
# 2 2022-01-01 01:00:00 4.685377  7.754217
# 3 2022-01-01 02:00:00 1.430698 11.183153
# 4 2022-01-01 03:00:00 4.152238  5.743904
# 5 2022-01-01 04:00:00 3.208728  2.976310
# 6 2022-01-01 05:00:00 2.595480  6.328965

dplyr

library(dplyr)
dat <- tibble(dat) %>%
  mutate(
    month = factor(format(observed, format = "%b"), levels = month.abb),
    day = as.Date(observed)
  )
dat
# # A tibble: 8,760 × 5
#    observed               x1    x2 month day       
#    <dttm>              <dbl> <dbl> <fct> <date>    
#  1 2022-01-01 00:00:00 4.57   7.88 Jan   2022-01-01
#  2 2022-01-01 01:00:00 4.69   7.75 Jan   2022-01-01
#  3 2022-01-01 02:00:00 1.43  11.2  Jan   2022-01-01
#  4 2022-01-01 03:00:00 4.15   5.74 Jan   2022-01-01
#  5 2022-01-01 04:00:00 3.21   2.98 Jan   2022-01-01
#  6 2022-01-01 05:00:00 2.60   6.33 Jan   2022-01-01
#  7 2022-01-01 06:00:00 3.68   5.75 Jan   2022-01-01
#  8 2022-01-01 07:00:00 0.673  1.60 Jan   2022-01-01
#  9 2022-01-01 08:00:00 3.28   8.85 Jan   2022-01-01
# 10 2022-01-01 09:00:00 3.53   2.66 Jan   2022-01-01
# # … with 8,750 more rows
# # ℹ Use `print(n = ...)` to see more rows

Group by month:

library(tidyr) # pivot_longer
library(ggplot2)
dat %>%
  group_by(month) %>%
  summarize(across(c(x1, x2), ~ mean(.))) %>%
  pivot_longer(-month) %>%
  ggplot(aes(month, value)) +
  geom_line(aes(color = name, group = name))

Aggregate by day, just replace month with day above:

dat %>%
  group_by(day) %>%
  summarize(across(c(x1, x2), ~ mean(.))) %>%
  pivot_longer(-day) %>%
  ggplot(aes(day, value)) +
  geom_line(aes(color = name, group = name))

Combine them into one plot. We have to be a little creative here with monthly, since month abbreviation is not a date.

daily <- dat %>%
  group_by(day) %>%
  summarize(across(c(x1, x2), ~ mean(.))) %>%
  pivot_longer(-day)
monthly <- dat %>%
  group_by(month) %>%
  summarize(day = as.Date(first(observed)), across(c(x1, x2), ~ mean(.)), .groups = "drop") %>%
  select(-month) %>%
  pivot_longer(-day)

bind_rows(lst(monthly, daily), .id = "frequency") %>%
  ggplot(aes(day, value)) +
  geom_line(aes(group = interaction(frequency, name), color = interaction(frequency, name)))

Notes:

For the first aggregation by-month using month abbreviations on the x-axis, if we did not make them factor(.) then the order would have been alphabetic, not by month itself. This is a common theme in ggplot2: if you need a categorical field ordered any way but alphabetic, make it a factor.
There are numerous ways to shape the x-axis with dates, see scale_x_date (and scale_x_datetime when using POSIXct timestamps). For instance, you can change the frequency of ticks (breaks= or date_breaks=) and how they appear (labels= or date_labels=).

hi @r2evans the answer you have provided was very very useful. I was wondering if you could also tell me if it is also possible to produce a plot of daily aggregated mean from day 1-30 so the x axis would should day 1 from 30 in a plot :) — Dome, Dec 28 '22 at 23:04
Sure, add day-of-week (`mutate(dow = format(day, format="%d")`) and then plot against it instead of `day`. — r2evans, Dec 29 '22 at 04:50

calculating monthly averages (aggregated) in a data frame and plot it in ggplot

1 Answers1

dplyr