11

I have a file which contains time-series data for multiple variables from a to k.

I would like to create a graph that plots the average of the variables a to k over time and above and below that average line adds a smoothed area representing maximum and minimum variation on each day.

So something like confidence intervals but in a smoothed version.

Here's the dataset: https://dl.dropbox.com/u/22681355/co.csv

and here's the code I have so far:

library(ggplot2)
library(reshape2)
meltdf <- melt(df,id="Year")
ggplot(meltdf,aes(x=Year,y=value,colour=variable,group=variable)) + geom_line()
Roland
  • 127,288
  • 10
  • 191
  • 288
user1723765
  • 6,179
  • 18
  • 57
  • 85

1 Answers1

11

This depicts bootstrapped 95 % confidence intervals:

ggplot(meltdf,aes(x=Year,y=value,colour=variable,group=variable)) +
  stat_summary(fun.data = "mean_cl_boot", geom = "smooth")

ggplot smoothed bootstrap confidence

This depicts the mean of all values of all variables +-1SD:

ggplot(meltdf,aes(x=Year,y=value)) +
  stat_summary(fun.data ="mean_sdl", mult=1, geom = "smooth")

enter image description here

You might want to calculate the year means before calculating the means and SD over the variables, but I leave that to you.

However, I believe a boostrap confidence interval would be more sensible, since the distribution is clearly not symmetric. It would also be narrower. ;)

And of course you could log-transform your values.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • Roland thanks, maybe I've not been clear, but what I would like to do is plot the average of all these variables and instead of confidence intervals have shaded areas for each day which represents the variation in a to k above and below the average for each day. – user1723765 Nov 12 '12 at 10:11
  • I do not know what you mean by "variation above and below the average". – Roland Nov 12 '12 at 10:13
  • 1. you take the average of a-k and plot that for each day. 2.for each day you will have some of the a-k variables above and below that average. I would like to have a shaded area showing how much variation there is around the mean. – user1723765 Nov 12 '12 at 10:20
  • @user1723765 I added an example of what I think you want. – Roland Nov 12 '12 at 10:20
  • Yes this is what I would like to have. One extra thing that would be useful is to somehow add different shading in different areas depending on where the majority of values lie. Is that possible to do? thanks for your help – user1723765 Nov 12 '12 at 11:39
  • @user1723765: You don't seem to realize how difficult that is. You probably want (3D, contour) density plots at each time point and than interpolate between those. But of course density is constrained to the integral being 1 and that would need to be considered for interpolation. If you don't want to interpolate, you could use `geom_violin`. – Roland Nov 12 '12 at 15:19