0

I have data with the following columns: lot, sublot, size, data. I have multiple lot(s) and each lot can have multiple sublot(s). Each sublot has size(s) of 1 to 4.

I have created a boxplot for this data using the following code:

df <- 
  readXL("Z:/R_Files/example.xlsx",
  rownames=FALSE, header=TRUE, na="", sheet="Sheet1", 
  stringsAsFactors=TRUE)

x11()
par(mar=c(10.1, 5.1, 4.1, 2.1))
boxplot(data ~ size*sublot*lot,
    xlab="", ylab="Data", main="Data by Size, Sublot, Lot",
    las=2,
    data=df)
title(xlab='Size.Sublot.Lot', line=9)

I wanted to use the boxfill command to color each boxplot based on the lot#. I have seen two solutions:

  1. create a vector and explicitly specify the colors to be used e.g. colr = c("red", "red", "red", .... "green", "green", "green", ... "blue"). The problem with this solution is that it requires me to know apriori the number of lots in df and number of times the color needs to be repeated.
  2. use "ifelse" statement. The problem with this solution is that (a) I need to know the number of lots and (b) I need to create multiple nested ifelse statements.

I would prefer to create a "dynamic" solution which creates the color vector based on the number of lot entries I have in my file.

I have tried to create:

uniqlot <- unique(df$lot)
colr <- palette(rainbow(length(uniqlot)))

but am stuck since the entries in the colr vector do not repeat for the number of unique combinations of size.sublot.lot. Note: I want all boxplots for lot ABC to be colored with one color, all boxplots for lot DEF to be colored with another color etc.

I am attaching a picture of the uncolored boxplot. Uncolored Boxplot

Raw data (example.xlsx) can be accessed at the following link: example.xlsx

VikG
  • 1
  • 3
  • 1
    It's easier to help if you provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data (that's not a private file on your machine). See the provided link for tips on how to do that. – MrFlick Oct 12 '16 at 18:47
  • Thank you very much for your help and the pointer. I was wondering how to place the xlsx file. I have placed it on a Google drive with an accessible link. – VikG Oct 12 '16 at 19:18

2 Answers2

0

This is what I would do:

n1 <- length(unique(df$sublot))
n2 <- length(unique(df$size))
colr <- palette(rainbow(length(n)))
colr <- rep(colr, each = n1*n2)

boxplot(data ~ size*sublot*lot,
        col = colr,
        xlab="", ylab="Data", main="Data by Size, Sublot, Lot",
        las=2,
        data=df)

Using ggplot:

df$size <- as.factor(df$size)

ggplot(df, aes(sublot, data, group = interaction(size, sublot), col = size)) +
    geom_boxplot() +
    facet_wrap(~lot, nrow = 1)

enter image description here

Also, you can get rid of df$size <- as.factor(df$size) if you want continuous colour.

parksw3
  • 649
  • 4
  • 11
  • Wow - thank you! I have not used ggplot and was trying to do this in the base R package. This works. I did not find the ggplot package but have installed ggplot2. – VikG Oct 12 '16 at 21:26
  • @VikG Another mistake. My bad. It is supposed to be ggplot2. You can also do it with base R package. `data ~ size * sublot * lot` creates 16 (4 sizes * 4 sublots) "boxes" for each lot so you just need to repeat each colour 16 times, which is done by the first four lines. – parksw3 Oct 12 '16 at 21:28
  • Oops - hit enter too soon. One request on this - the plot shows only the lot number along the x axis - while the boxplots are for the lot.sublot.size combinations. How do I get the x axis to also show the lot.sublot.size along the x axis? I tried editing the code above to say ... aes(lot.sublot... and ...aes(lot*sublot... but both error out. Again - thanks for your help. – VikG Oct 12 '16 at 21:30
  • @VikG Unfortunately, I tried to figure that out before you commented but I haven't been able to do that. You'll have to stick with boxplot unless someone else comes up with an answer. You could try using `facet_wrap` instead and try what I have above. It's different from what you asked but I think that might be easier to read than having everything on the x-axis. Let me know what you think. – parksw3 Oct 12 '16 at 21:39
0

thanks to the pointers provided in the responses and after digging around a little more, I was able to find a solution to my own question. I wanted to submit this piece of code in case someone needed to replicate.

Here is a picture of the boxplot this code creates (and I wanted to create). colored boxplot

df <- 
      readXL("Z:/R_Files/example.xlsx",
      rownames=FALSE, header=TRUE, na="", sheet="Sheet1", 
      stringsAsFactors=TRUE)

unqlot    <- unique(df$lot)
unqsublot <- unique(df$sublot)
unqsize   <- unique(df$size)
cul       <- palette(rainbow(length(unqlot)))
culur     <- character()

for (i in 1:length(unqsize)) {
    culur_temp = rep(cul[i], each=(length(unqsize)*length(unqsublot)))
    culur = c(culur, culur_temp)
}

par(mar=c(10.1, 5.1, 4.1, 2.1))
boxplot(data ~ size*sublot*lot,
    xlab="", ylab="Data", main="Data by Size, Sublot, Lot",
    col = culur,
    las=2,
    data=df)
VikG
  • 1
  • 3