0

I have this data representing the value of a variable Q1 along time. The time is not represented by dates, it is represented by the number of days since one event.

https://www.mediafire.com/file/yfzbx67yivvvkgv/dat.xlsx/file

I'm trying to plot the mean value of Q1along time, like in here

Plotting average of multiple variables in time-series using ggplot

I'm using this code

 library(Hmisc)
    ggplot(dat,aes(x=days,y=Q1,colour=type,group=type)) +
      stat_summary(fun.data = "mean_cl_boot", geom = "smooth")
Ana
  • 149
  • 3
  • 12

1 Answers1

1

Besides the code, which does not appear to work with the new ggplot2 version, you also have the problem that your data is not really suited for that kind of plot. This code achieves what you wanted to do:

dat <- rio::import("dat.xlsx")

library(ggplot2)
library(dplyr)dat %>% 
  ggplot(aes(x = days, y = Q1, colour = type, group = type)) +
  geom_smooth(stat = 'summary', fun.data = mean_cl_boot)

But the plot doesn't really tell you anything, simply because there aren't enough values in your data. Most often there seems to be only one value per day, the vales jump quickly up and down, and the gaps between days are sometimes quite big.

You can see this when you group the values into timespans instead. Here I used round(days, -2) which will round to the nearest 100 (e.g., 756 is turned into 800, 301 becomes 300, 49 becomes 0):

dat %>% 
  mutate(days = round(days, -2)) %>% 
  ggplot(aes(x = days, y = Q1, colour = type, group = type)) +
  geom_smooth(stat = 'summary', fun.data = mean_cl_boot)

This should be the same plot as linked but with huge confidence intervals. Which is not surprising since, as mentioned, values quickly alternate between values 1-5. I hope that helps.

JBGruber
  • 11,727
  • 1
  • 23
  • 45