5

I am failry new to R and recently used it to make some Boxplots. I also added the mean and standard deviation in my boxplot. I was wondering if i could add some kind of tick mark or circle in different percentile as well. Let's say if i want to mark the 85th, $ 90th percentile in each HOUR boxplot, is there a way to do this? My data consist of a year worth of loads in MW in each hour & My output consist of 24 boxplots for each hour for each month. I am doing each month at a time because i am not sure if there is a way to run all 96(Each month, weekday/weekend , for 4 different zones) boxplots at once. Thanks in advance for help.

JANWD <-read.csv("C:\\My Directory\\MWBox2.csv")
JANWD.df<-data.frame(JANWD)
JANWD.sub <-subset(JANWD.df, MONTH < 2 & weekend == "NO")

KeepCols <-c("Hour" , "Houston_Load")
HWD <- JANWD.sub[ ,KeepCols]

sd <-tapply(HWD$Houston_Load, HWD$Hour, sd)
means <-tapply(HWD$Houston_Load, HWD$Hour, mean)

boxplot(Houston_Load ~ Hour, data=HWD, xlab="WEEKDAY HOURS", ylab="MW Differnce", ylim= c(-10, 20), smooth=TRUE ,col ="bisque", range=0)

points(sd, pch = 22, col= "blue")
points(means, pch=23, col ="red")

#Output of the subset of data used to run boxplot for month january in Houston 
str(HWD)
'data.frame':   504 obs. of  2 variables:
 `$ Hour        : int  1 2 3 4 5 6 7 8 9 10 ...'
 `$ Houston_Load: num  1.922 2.747 -2.389 0.515 1.922 ...'

#OUTPUT of the original data
str(JANWD)
'data.frame':   8783 obs. of  9 variables:
 $ Date        : Factor w/ 366 levels "1/1/2012","1/10/2012",..: 306 306 306 306 306 306 306 306 306 306 ...
 `$ Hour        : int  1 2 3 4 5 6 7 8 9 10 ...'
` $ MONTH       : int  8 8 8 8 8 8 8 8 8 8 ...'
 `$ weekend     : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...'
 `$ TOTAL_LOAD  : num  0.607 5.111 6.252 7.607 0.607 ...'
 `$ Houston_Load: num  -2.389 0.515 1.922 2.747 -2.389 ...'
 `$ North_Load  : num  2.95 4.14 3.55 3.91 2.95 ...'
 `$ South_Load  : num  -0.108 0.267 0.54 0.638 -0.108 ...'
 `$ West_Load   : num  0.154 0.193 0.236 0.311 0.154 ...'
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
Gyve
  • 57
  • 1
  • 7
  • 2
    If your example would be [reproducible](http://stackoverflow.com/q/5963269/289572) (i.e., the data available for us to play with), I would give it a try. And I think you don't need `lattice` when using `boxplot` and `points` only. – Henrik Sep 17 '12 at 15:43
  • Sure. How can i send you the sample file format? I apologize for not being familiar with reproducible examples within the dataset of R. – Gyve Sep 17 '12 at 16:03

1 Answers1

5

Here is one way, using quantile() to compute the relevant percentiles for you. I add the marks using rug().

set.seed(1)
X <- rnorm(200)
boxplot(X, yaxt = "n")

## compute the required quantiles
qntl <- quantile(X, probs = c(0.85, 0.90))

## add them as a rgu plot to the left hand side
rug(qntl, side = 2, col = "blue", lwd = 2)

## add the box and axes
axis(2)
box()

Update: In response to the OP providing str() output, here is an example similar to the data that the OP has to hand:

set.seed(1) ## make reproducible
HWD <- data.frame(Hour = rep(0:23, 10),
                  Houston_Load = rnorm(24*10))

Now get I presume you want ticks at 85th and 90th percentiles for each Hour? If so we need to split the data by Hour and compute via quantile() as I showed earlier:

quants <- sapply(split(HWD$Houston_Load, list(HWD$Hour)),
                 quantile, probs = c(0.85, 0.9))

which gives:

R> quants <- sapply(split(HWD$Houston_Load, list(HWD$Hour)),
+                  quantile, probs = c(0.85, 0.9))
R> quants
            0         1        2         3         4         5        6
85% 0.3576510 0.8633506 1.581443 0.2264709 0.4164411 0.2864026 1.053742
90% 0.6116363 0.9273008 2.109248 0.4218297 0.5554147 0.4474140 1.366114
            7         8        9       10        11        12       13       14
85% 0.5352211 0.5175485 1.790593 1.394988 0.7280584 0.8578999 1.437778 1.087101
90% 0.8625322 0.5969672 1.830352 1.519262 0.9399476 1.1401877 1.763725 1.102516
           15        16        17        18       19        20       21
85% 0.6855288 0.4874499 0.5493679 0.9754414 1.095362 0.7936225 1.824002
90% 0.8737872 0.6121487 0.6078405 1.0990935 1.233637 0.9431199 2.175961
          22        23
85% 1.058648 0.6950166
90% 1.145783 0.8436541

Now we can draw marks at the x locations of the boxplots

boxplot(Houston_Load ~ Hour, data = HWD, axes = FALSE)
xlocs <- 1:24 ## where to draw marks
tickl <- 0.15 ## length of marks used
for(i in seq_len(ncol(quants))) {
    segments(x0 = rep(xlocs[i] - 0.15, 2), y0 = quants[, i],
             x1 = rep(xlocs[i] + 0.15, 2), y1 = quants[, i],
             col = c("red", "blue"), lwd = 2)
}
title(xlab = "Hour", ylab = "Houston Load")
axis(1, at = xlocs, labels = xlocs - 1)
axis(2)
box()
legend("bottomleft", legend = paste(c("0.85", "0.90"), "quantile"),
       bty = "n", lty = "solid", lwd = 2, col = c("red", "blue"))

The resulting figure should look like this:

extended boxplot example

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Thanks Gavin. When i Try that, i get following errors.---> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)) : <--i tried something like that before, but i only got the percentiles for Hour 1 and not for rest of the hours. – Gyve Sep 17 '12 at 15:56
  • That is why a reproducible example helps. Can you make up some data in the correct format and add that to your question so we can see what you have and then advise accordingly? – Gavin Simpson Sep 17 '12 at 16:01
  • Well, i did research on the term reproducible example here and i tried to create one for my example and failed miserably. Is there way i can email you sample data? My very first time in this website and still trying to figure out the basics. Sorry for the inconvenience guys. – Gyve Sep 17 '12 at 17:00
  • An alternative is to rung `str(obj)` where `obj` is the name of your object, so `str(HWD)` in your case. Then I can see what types of data you have etc. Then i can make something to suit. If you post the output in your question as an edit, I'll show you how to write a reproducible example and hopefully we'll get to the answer you need? How does that sound? – Gavin Simpson Sep 17 '12 at 17:08
  • okay thank! I posted the str(HWD) which is the subset of the original data and also the str(JANWD) , the original data which i used to create the subset HWD. I hope this will help. THanks! – Gyve Sep 17 '12 at 18:18
  • There you go. Give that a try. Let us know if you still have problems. – Gavin Simpson Sep 17 '12 at 19:21
  • Great! It worked!! Awesome! Thank you Gavin! Also, Is there a clever way to perform my second part of question instead of doing the each boxplot. To avoid copying/pasting code 96 times and editing. There has to be smarter way to do this than my approach of getting it done piece by piece. For ex: to output those 8 boxplots for January with some combination? – Gyve Sep 17 '12 at 19:50
  • 1
    Write another Question with something specific (see how I created me example) and expected output. I don't quite understand what you mean in the Question above regarding 96 things. Don't forget to accept this Answer (by checking the tick mark next to it) if you are happy with the solution. – Gavin Simpson Sep 17 '12 at 20:19