0

What I would like to do is a plot (using ggplot), where the x axis represent years which have a different colour for the last three years in the plot than the rest. The last three years should also meet a certain criteria and based on this the last three years can either be red or green. The criteria is that the mean of the last three years should be less (making it green) or more (making it red) than the 66%-percentile of the remaining years. So far I have made two different functions calculating the last three year mean:

LYM3 <- function (x) {
  LYM3 <- tail(x,3)
  mean(LYM3$Data,na.rm=T)
}

And the 66%-percentile for the remaining:

perc66 <- function(x) {
  percentile <- head(x,-3)
  quantile(percentile$Data, .66, names=F,na.rm=T) 
}

Here are two sets of data that can be used in the calculations (plots), the first which is an example from my real data where LYM3(df1) < perc66(df1) and the second is just made up data where LYM3 > perc66.

df1<- data.frame(Year=c(1979:2010),
                Data=c(347261.87,  145071.29,   110181.93,  183016.71,  210995.67,  205207.33,  103291.78,  247182.10,  152894.45,  170771.50,  206534.55,  287770.86,  223832.43,  297542.86,  267343.54,  475485.47,  224575.08,  147607.81,  171732.38,  126818.10,  165801.08,  136921.58,  136947.63,  83428.05,   144295.87,  68566.23,   59943.05,   49909.08,   52149.11,   117627.75,  132127.79,  130463.80))
df2 <- data.frame(Year=c(1979:2010),
                  Data=c(sample(50,29,replace=T),75,75,75))

Here’s my code for my plot so far:

plot <- ggplot(df1, aes(x=Year, y=Data)) +
  theme_bw() +
  geom_point(size=3, aes(colour=ifelse(df1$Year<2008, "black",ifelse(LYM3(df1) < perc66(df1),"green","red")))) +
  geom_line() +
  scale_x_continuous(breaks=c(1980,1985,1990,1995,2000,2005,2010), limits=c(1978,2011))
plot

As you notice it doesn’t really do what I want it to do. The only thing it does seem to do is that it turns the years before 2008 into one level and those after into another one and base the point colour off these two levels.

Since I don’t want this year to be stationary either, I made another tiny function:

fun3 <- function(x) {
df <- subset(x, Year==(max(Year)-2))
df$Year
}

So the previous code would have the same effect as:

geom_point(size=3, aes(colour=ifelse(df1$Year<fun3(df1), "black","red"))) 

But it still does not care about my colours. Why does it make the years into levels? And how come an ifelse function doesn’t work within another one in this case? How would it be possible to the arguments to do what I like? I realise this might be a bit messy, asking for a lot at the same time, but I hope my description is pretty clear. It would be helpful if someone could at least point me in the right direction.

I tried to put the code for the plot into a function as well so I wouldn’t have to change the data frame at all functions within the plot, but I can’t get it to work.

Thank you!

balconydoor
  • 55
  • 1
  • 2
  • 11

1 Answers1

2

Here is my suggestion. I am not sure if you want to have ifelse() in color. That makes codes hard to read for me. I subsetted data in order to calculate mean for 2008-2010 and quantile 0.66 for the rest of the years. Then, I created two choices for colors. One includes black (29 times) and green (3 times). The other choice was black (29 times) and red (3 times). Next step was to draw a ggplot figure using a conditional statement. if(mean(foo$Data) < quantile(foo2$Data, 0.66)) is true, R picks up b for colors, which includes green. Otherwise, R picks up c for colors. In this way, you do not have to do a lot for colors in ggplot(). I hope this will help you.

UPDATES ADDED

I changed the filter part. As for the quantile line, this post is very useful. Basically, you need a dummy data frame for the value of quantile 0.66. geom_hline is added as well.

library(ggplot2)

# Filter data (If you are sure that last three rows are the ones you need to
# extract, this is the way.
foo <- tail(df1, n = 3)  
foo2 <- setdiff(df1, foo)

# Set up colours
a <- c(nrow(foo2), nrow(foo))
b <- rep(c("black", "green"), a)
c <- rep(c("black", "red"), a)

# Create a dummy data frame for the quantile line
# Column names can be anything (here, A and Z)

agasi <- data.frame(X = c("A"), Z = quantile(foo2$Data, 0.66))

if(mean(foo$Data) < quantile(foo2$Data, 0.66)){

ggplot(df1, aes(x=Year, y=Data)) +
    theme_bw() +
    geom_point(size=3, color = b) +
    geom_line() +
    scale_x_continuous(breaks=c(1980,1985,1990,1995,2000,2005,2010), limits=c(1978,2011)) +
    geom_hline(data = agasi, aes(yintercept = Z))

} else{

ggplot(df1, aes(x=Year, y=Data)) +
    theme_bw() +
    geom_point(size=3, color = c) +
    geom_line() +
    scale_x_continuous(breaks=c(1980,1985,1990,1995,2000,2005,2010), limits=c(1978,2011)) +
    geom_hline(data = agasi, aes(yintercept = Z))   

}
Community
  • 1
  • 1
jazzurro
  • 23,179
  • 35
  • 66
  • 76
  • Thank you, that worked and made it much nicer to look at. Though I stuck with `tail()` since I want the years to adapt to other sets of data. Perhaps you could help me further, I added this into a `function(x) {your code}` and just replaced "df1" with x, which worked nicely. Though I'd also like to add a horizontal line to the plot, =66%-percentile of foo2 so I added `hl <- quantile(foo2$Data,0.66,names=F)` and then the line `geom_hline(aes(yintercept=hl), linetype="dashed") +` to the plot. However when I run it, "hl" can't be found and I don't know why it's not created. Any ideas? Thanks! – balconydoor Aug 26 '14 at 07:18
  • Aye, that worked. Thanks! Any idea why it doesn't work by doing it the way I did? Edit: Just noticed that you added a link to another post, I'll check it out. Thanks again! – balconydoor Aug 26 '14 at 08:31
  • I am not the best person to answer why you could not make it. I just accept what R can interpret. This is like natural language learning to me. Anyway, If I follow your way, `geom_hline(yintercept=quantile(foo2$Data, 0.66))` is the way to write the line. I do not think you were off. `aes()` in geom_hline was not necessary in this case. – jazzurro Aug 26 '14 at 11:35