3

I am trying to create a GGPLOT2 smoothed line graph that looks more like this

enter image description here

Source: http://www.esrl.noaa.gov/psd/enso/mei/

and less like this:

enter image description here

Source: https://dl.dropboxusercontent.com/u/16400709/StackOverflow/Rplot02.png

My data are available on dropbox.

Having looked at previous posts I used the code below:

#MEI Line Graph

d4 <- read.csv("https://dl.dropboxusercontent.com/u/16400709/StackOverflow/Data_MEI.csv")
head(d4,n=20)

MEI<-ggplot(d4,aes(x=d4$Date, y=d4$MEI,group=1))+geom_line()

MEI+stat_smooth(method ="auto",level=0.95)

What I think I need is to reduce the amount of smoothing taking place, but I have yet to figure out how to achieve this.

d4s<-SMA(d4$MEI,n=8)
plot.ts(d4s)

SMA() works well but I cant get it to work with ggplot Any hints would be appreciated!

Andy Clifton
  • 4,926
  • 3
  • 35
  • 47
Methexis
  • 2,739
  • 5
  • 24
  • 34
  • Which graph on that web page are you trying to reproduce? – Roland Aug 18 '14 at 16:19
  • apologies...the top one! – Methexis Aug 18 '14 at 16:20
  • 2
    First rule of using statistics codes: if you want to really understand what your analysis and plots show, you have to code things yourself. So, here, I would encourage you to smooth the data external to ggplot so you have more control (and visibility) over whats happening. – Andy Clifton Aug 18 '14 at 16:22
  • 2
    You need to play with moving averages. – Fernando Aug 18 '14 at 16:23
  • 1
    Something like `stat_smooth(method="loess", span=0.1)` and adjust the span value to your liking. However, the graph you reference filled the area between line and axis. – Roland Aug 18 '14 at 16:26
  • Span! Thats the variable...thank you very much Roland! If you can write this as an answer I will close the question so you get your points, bounty etc! – Methexis Aug 18 '14 at 16:38

1 Answers1

10

Be aware that the MEI index is for a 2-month period, so it's already got some smoothing built in. Assuming that you are using the MEI data that NOAA ESRL publishes, you should be able to create the same plot.

First of all you need to get the system set up, as you'll be working with timezeones:

# set things up  ----
working.dir = file.path('/code/R/StackOverflow/')
setwd(working.dir)
Sys.setenv(TZ='GMT')

now, download your data and read it in

d.in <- read.csv("MEI.txt")

The next step is to get the dates formatted properly.

d.in$Date <- as.POSIXct(d.in$Date,
                        format = "%d/%m/%Y", 
                        tz = "GMT")

and because we need to figure out where things cross the x-axis, we'll have to work in decimal dates. Use the Epoch value:

d <- data.frame(x = as.numeric(format(d.in$Date,
                                      '%s')),
                y = d.in$MEI)

Now we can figure out the zero-crossings. We'll use Beroe's example for that.

rx <- do.call("rbind",
              sapply(1:(nrow(d)-1), function(i){
                f <- lm(x~y, d[i:(i+1),])
                if (f$qr$rank < 2) return(NULL)
                r <- predict(f, newdata=data.frame(y=0))
                if(d[i,]$x < r & r < d[i+1,]$x)
                  return(data.frame(x=r,y=0))
                else return(NULL)
              }))

and tack that on the end of the initial data:

d2 <- rbind(d,rx)

now convert back to dates:

d2$date <- as.POSIXct(d2$x,
                      origin = "1960-01-01",
                      format = "%s",
                      tz = "GMT")

now we can do the plot:

require(ggplot2)
ggplot(d2,aes(x = date,
              y = y)) + 
  geom_area(data=subset(d2, y<=0), fill="blue") + 
  geom_area(data=subset(d2, y>=0), fill="red") + 
  scale_y_continuous(name = "MEI")

and that gives you this:

enter image description here

Now, do you really need to smooth this?

Community
  • 1
  • 1
Andy Clifton
  • 4,926
  • 3
  • 35
  • 47
  • 1
    Why do you use `POSIXct` for this? `Date` should be preferable as you wouldn't have to worry about time zones. – Roland Aug 19 '14 at 07:02
  • 1
    I use posixct precisely because it lets me control time zones. I do a lot of work with time series data and I prefer to be able to control exactly what happens to any date/time. Also, I did need to calculate the zero crossings so I figured posixct as I could convert those times into epoch seconds and get the zero crossings more accurately. – Andy Clifton Aug 19 '14 at 14:16
  • 1
    I should add that as soon as you start changing date / time formats then time zones can bite. – Andy Clifton Aug 19 '14 at 14:17
  • Im not sure if anyone will pick this up, but I have come back to this example and data.frame(x = as.numeric(format(d$Date,'%s')),y = d$MEI) is turning my Dates into NA's, any idea why this might be? Would reading if from a CSV rather than a text file make a difference? – Methexis Sep 30 '14 at 21:10
  • 1
    @Methexis: it's hard to say without a worked example that explains your question about read.csv in more depth. I suspect your problem is actually with the date formatting in your comment, not the read. I'd look at the first row in 'd' to confirm the read from file worked, then debug from there. – Andy Clifton Oct 01 '14 at 02:23
  • odd....started a fresh session of RStudio this morning and all was good! Thanks again! – Methexis Oct 01 '14 at 07:14
  • 1
    @Methexis; could you provide some feedback on my answer to your original question? I'm happy to try to expand if required. Otherwise if it's a good answer there's always the big green tick thing you could click on :) – Andy Clifton Oct 03 '14 at 05:41