0

I have data in the following format:

        Date    Year    Month   Day     Flow
1   1953-10-01  1953    10       1      530
2   1953-10-02  1953    10       2      530
3   1953-10-03  1953    10       3      530

I would like to create a graph like this:

Here is my current image and code:

library(ggplot2)
library(plyr)
library(reshape2)
library(scales)

## Read Data
df <- read.csv("Salt River Flow.csv")

## Convert Date column to R-recognized dates
df$Date <- as.Date(df$Date, "%m/%d/%Y")

## Finds Water Years (Oct - Sept)
df$WY <- as.POSIXlt(as.POSIXlt(df$Date)+7948800)$year+1900

## Normalizes Water Years so stats can be applied to just months and days
df$w <- ifelse(month(df$Date) %in% c(10,11,12), 1903, 1904)

##Creates New Date (dat) Column
df$dat <- as.Date(paste(df$w,month(df$Date),day(df$Date), sep = "-"))

## Creates new data frame with summarised data by MonthDay
PlotData <- ddply(df, .(dat), summarise, Min = min(Flow), Tenth = quantile(Flow, p = 0.05), TwentyFifth = quantile(Flow, p =    0.25), Median = quantile(Flow, p = 0.50), Mean = mean(Flow), SeventyFifth = quantile(Flow, p = 0.75), Ninetieth = quantile(Flow, p = 0.90), Max = max(Flow))

## Melts data so it can be plotted with ggplot
m <- melt(PlotData, id="dat")

## Plots
p <- ggplot(m, aes(x = dat)) + 
geom_ribbon(aes(min = TwentyFifth, max = Median), data = PlotData, fill = alpha("black", 0.1), color = NA) + 
geom_ribbon(aes(min = Median, max = SeventyFifth), data = PlotData, fill = alpha("black", 0.5), color = NA) + 
scale_x_date(labels = date_format("%b"), breaks = date_breaks("month"), expand = c(0,0)) + 
geom_line(data = subset(m, variable == "Mean"), aes(y = value), size = 1.2) + 
theme_bw() + 
geom_line(data = subset(m, variable %in% c("Min","Max")), aes(y = value, group = variable)) + 
geom_line(data = subset(m, variable %in% c("Ninetieth","Tenth")), aes(y = value, group = variable), linetype = 2) + 
labs(x = "Water Year", y = "Flow (cfs)")

p

I am very close but there are some issues I'm having. First, if you can see a way to improve my code, please let me know. The main problem I ran into was that I needed two dataframes to make this graph: one melted, and one not. The unmelted dataframe was necessary (I think) to create the ribbons. I tried many ways to use the melted dataframe for the ribbons, but there was always a problem with the aesthetic length.

Second, I know to have a legend - and I want one, I need to have something in the aesthetics of each line/ribbon, but I am having trouble getting that to work. I think it would involve scale_fill_manual.

Third, and I don't know if this is possible, I would like to have each month label in between the tick marks, not on them (like in the above image).

Any help is greatly appreciated (especially with creating more efficient code).

Thank you.

John Paul
  • 12,196
  • 6
  • 55
  • 75
  • This SO question has some related `ggplot2` solutions http://stackoverflow.com/questions/6340485/r-plot-a-time-series-with-quantiles-using-ggplot2 – sckott Oct 31 '13 at 21:14
  • What didn't work about the quantile approach? Your question needs to be more specific – Señor O Oct 31 '13 at 21:21

3 Answers3

1

Perhaps this will get you closer to what you're looking for, using ggplot2 and plyr:

library(ggplot2)
library(plyr)
library(lubridate)
library(scales)
df$MonthDay <- df$Date - years( year(df$Date) + 100 ) #Normalize points to same year
df <- ddply(df, .(Month, Day), mutate, MaxDayFlow = max(Flow) ) #Max flow on day
df <- ddply(df, .(Month, Day), mutate, MinDayFlow = min(Flow) ) #Min flow on day
p <- ggplot(df, aes(x=MonthDay) ) +
    geom_smooth(size=2,level=.8,color="black",aes(y=Flow)) + #80% conf. interval
    geom_smooth(size=2,level=.5,color="black",aes(y=Flow)) + #50% conf. interval
    geom_line( linetype="longdash", aes(y=MaxDayFlow) ) +
    geom_line( linetype="longdash", aes(y=MinDayFlow) ) +
    labs(x="Month",y="Flow") +
    scale_x_date( labels = date_format("%b") ) +
    theme_bw()

Edit: Fixed X scale and X scale label

keithing
  • 281
  • 1
  • 3
  • Thank you very much. I have combined some of your ideas with others for the following code: – user2943039 Nov 01 '13 at 20:59
  • Sorry for the confusion, but I have created a new post here if you wouldn't mind taking a look at it:http://stackoverflow.com/questions/19735509/graph-of-statistical-time-series-part-2-ribbons-on-ggplot?noredirect=1#comment29323747_19735509 – user2943039 Nov 01 '13 at 21:41
  • Also, I wasn't looking for the 80% and 50% confidence intervals around the mean, actually just the quantiles of the entire data. – user2943039 Nov 02 '13 at 00:03
  • Would you please take a look at my new code? I've updated this post. – user2943039 Nov 03 '13 at 00:01
1

Something along these lines might get you close with base:

library(lubridate)
library(reshape2)
# simulating data...
Date  <- seq(as.Date("1953-10-01"),as.Date("2010-10-01"),by="day")
Year  <- year(Date)
Month <- month(Date)
Day <- day(Date)
set.seed(1)
Flow <- rpois(length(Date), 2000)
Data <- data.frame(Date=Date,Year=Year,Month=Month,Day=Day,Flow=Flow)

# use acast to get it in a convenient shape:
PlotData <- acast(Data,Year~Month+Day,value.var="Flow")
# apply for quantiles
Quantiles <- apply(PlotData,2,function(x){
    quantile(x,probs=c(1,.9,.75,.5,.25,.1,0),na.rm=TRUE)
  })
Mean <- colMeans(PlotData, na.rm=TRUE)
# ugly way to get month tick separators
MonthTicks <- cumsum(table(unlist(lapply(strsplit(names(Mean),split="_"),"[[",1))))

# and finally your question:
plot(1:366,seq(0,max(Flow),length=366),type="n",xlab = "Water Year",ylab="Discharge",axes=FALSE)
polygon(c(1:366,366:1),c(Quantiles["50%",],rev(Quantiles["75%",])),border=NA,col=gray(.6))
polygon(c(1:366,366:1),c(Quantiles["50%",],rev(Quantiles["25%",])),border=NA,col=gray(.4))
lines(1:366,Quantiles["90%",], col = gray(.5), lty=4)
lines(1:366,Quantiles["10%",], col = gray(.5))
lines(1:366,Quantiles["100%",], col = gray(.7))
lines(1:366,Quantiles["0%",], col = gray(.7), lty=4)
lines(1:366,Mean,lwd=3)
axis(1,at=MonthTicks, labels=NA)
text(MonthTicks-15,-100,1:12,pos=1,xpd=TRUE)
axis(2)

The plotting code really isn't that tricky. You'll need to clean up the aesthetics, but polygon() is usually my strategy for shaded regions in plots (confidence bands, whatever).

enter image description here

tim riffe
  • 5,651
  • 1
  • 26
  • 40
  • Hi Tim, I have pretty much got what I want with a code in ggplot2. However, I can't seem to get the x axis labels to go in between the tick marks like you have here...they are at the tick marks. Any thoughts? I haven't been able to find anything besides "hjust" which only changes the text justification of the label. – user2943039 Nov 02 '13 at 22:23
  • Hi Tim, would you please look at my updated code and let me know what you think? – user2943039 Nov 03 '13 at 00:01
  • I don't know my way around ggplot2, sorry. You might try looking here: http://www.cookbook-r.com/Graphs/Axes_(ggplot2)/ – tim riffe Nov 03 '13 at 03:42
0

(Partial answer with base plotting function and not including the min, max, or mean.) I suspect you will need to construct a dataset before passing to ggplot, since that is typical for that function. I already do something similar and then pass the resulting matrix to matplot. (It doesn't do that kewl highlighting, but maybe ggplot can do it>

HDL.mon.mat <- aggregate(dfrm$Flow, 
               list(  dfrm$Year + dfrm$Month/12), 
               quantile, prob=c(0.1,0.25,0.5,0.75, 0.9), na.rm=TRUE)
matplot(HDL.mon.mat[,1], HDL.mon.mat$x, type="pl")
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thank you! I am new to the forum, so I created a new question instead of editing this one, but let me know what you think:http://stackoverflow.com/questions/19735509/graph-of-statistical-time-series-part-2-ribbons-on-ggplot?noredirect=1#comment29323747_19735509 – user2943039 Nov 01 '13 at 21:42