41

There is probably a very easy solution to my problem but I couldn't find a satisfying answer online.

Using the following command I was able to create the following boxplot graph and overlay it with the individual data points:

ggplot(data = MYdata, aes(x = Age, y = Richness)) + 
  geom_boxplot(aes(group=Age)) + 
  geom_point(aes(color = Age))

There are several things I would like to add/change:

1. Change the line color and/or fill of each boxplot (depending on "Age") using 6 different colors from left to right:

c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00")

I tried

ggplot(data = MYdata, aes(Age, Richness)) + 
  geom_boxplot(aes(group=Age)) + 
  scale_colour_manual(values = c("#E69F00", "#56B4E9", "#009E73", 
                                 "#F0E442", "#0072B2", "#D55E00")) 

but it results in a "Continuous value supplied to discrete scale" error.

2. Change the color of each data point (depending on "Age") using 6 different colors from left to right:

c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00")

I tried:

ggplot(data = MYdata, aes(Age, Richness)) + 
  geom_boxplot(aes(group=Age)) + 
  geom_point(aes(color = Age)) + 
  scale_colour_manual(values = c("#E69F00", "#56B4E9", "#009E73", 
                                 "#F0E442", "#0072B2", "#D55E00")) 

but it also results in an error:

Continuous value supplied to discrete scale

3. Change the text in the legend to "0 month", "1 month", "3 months", "6 months", "9 months", "12 months"

zx8754
  • 52,746
  • 12
  • 114
  • 209
Dalmuti71
  • 1,509
  • 3
  • 15
  • 19

1 Answers1

55

First, providing sample data would help. Since you didn't, here is some:

MYdata <- data.frame(Age = rep(c(0,1,3,6,9,12), each=20),
                    Richness = rnorm(120, 10000, 2500))

Parts 1 and 2 stem from the same problem. Age is a continuous variable, but you are trying to use it in a discrete scale (by specifying the color for specific values of age). In general, a scale maps the variable to the visual; for a continuous age, there is a corresponding color for every possible value of age, not just the ones that happen to appear in your data. However, you can simultaneously treat age as a categorical variable (factor) for some of the aesthetics. For the third part of your question, within the scale description, you can define specific labels corresponding to specific breaks in the scale. Putting this all together (and adding something to give you the x axis labelled more like what you have in the example):

ggplot(data = MYdata, aes(x = Age, y = Richness)) + 
  geom_boxplot(aes(fill=factor(Age))) + 
  geom_point(aes(color = factor(Age))) +
  scale_x_continuous(breaks = c(0, 1, 3, 6, 9, 12)) +
  scale_colour_manual(breaks = c("0", "1", "3", "6", "9", "12"),
                      labels = c("0 month", "1 month", "3 months",
                                 "6 months", "9 months", "12 months"),
                      values = c("#E69F00", "#56B4E9", "#009E73", 
                                 "#F0E442", "#0072B2", "#D55E00")) +
  scale_fill_manual(breaks = c("0", "1", "3", "6", "9", "12"),
                      labels = c("0 month", "1 month", "3 months",
                                 "6 months", "9 months", "12 months"),
                      values = c("#E69F00", "#56B4E9", "#009E73", 
                                 "#F0E442", "#0072B2", "#D55E00"))

enter image description here

With this color scheme, the points that fall inside the boxplot are not visible (since they are the same color as the boxplot's fill). Perhaps leaving the boxplot hollow and drawing its lines in the color would be better.

ggplot(data = MYdata, aes(x = Age, y = Richness)) + 
  geom_boxplot(aes(colour=factor(Age)), fill=NA) + 
  geom_point(aes(color = factor(Age))) +
  scale_x_continuous(breaks = c(0, 1, 3, 6, 9, 12)) +
  scale_colour_manual(breaks = c("0", "1", "3", "6", "9", "12"),
                      labels = c("0 month", "1 month", "3 months",
                                 "6 months", "9 months", "12 months"),
                      values = c("#E69F00", "#56B4E9", "#009E73", 
                                 "#F0E442", "#0072B2", "#D55E00"))

enter image description here

Finally, consider if you really need to color each age differently, since they are well defined by the x-axis already.

Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
  • 1
    Thank you so much, Brian! Don't really know (yet) how to generate a random data set in R. Thanks for taking this on anyway! I have a follow-up question based on your suggestion to keep the boxes blank. How can I change the LINE color of each box (e.g. to gray)? I changed the fill color with geom_boxplot(aes(colour=factor(Age)), fill="gray80"). Then tried scale_colour_manual(breaks = c("0", "1", "3", "6", "9", "12"), values = c("gray80", "gray80", "gray80", "gray80", "gray80", "gray80")) but then the data points also became gray. Of course I'd like to keep the points colored. – Dalmuti71 May 30 '12 at 04:31
  • 1
    `geom_boxplot(aes(position = factor(Age)), colour = "grey", fill = NA)` or `geom_boxplot(aes(group = factor(Age)), colour = "grey", fill = NA)` – Sandy Muspratt May 30 '12 at 07:32
  • 1
    Thanks, Sandy! Just for my own understanding: what does "position = factor(Age)" or "group = factor(Age)" do so that I can determine the line color? – Dalmuti71 May 30 '12 at 22:35
  • `group=factor(Age)` says that there should be a separate boxplot for every different value of `Age` (that is, the age should be treated categorically for purposes of determining what different boxplots there are). I don't know what the `position=factor(Age)` does. – Brian Diggs May 30 '12 at 22:40
  • 1
    @Dalmuti71 `colour` refers to the line/border and `fill` refers to the interior. Play with a `qplot(x,y,data, geom = "polygon")` or `geom="hist"` to clearly show the difference. – isomorphismes Aug 25 '12 at 23:50
  • How can you color but omit the legend? After all, it looks redundant? – Martin Velez Feb 26 '14 at 04:22
  • @MartinVelez You can suppress the entire legend with `theme(legend.position = "none")` – Brian Diggs Mar 05 '14 at 17:45
  • Is there a reason for using the character array `c("0", "1", "3", "6", "9", "12")` in the `breaks` argument to `scale_colour_manual`? Numerical array seems to work as well. – Sid Apr 18 '17 at 17:02
  • @Sid I think it was related to matching the labels of the factors rather than the levels of the factors, but I'm not completely sure. And things may have changed since then as well. – Brian Diggs Apr 24 '17 at 18:50
  • Yes, that solution solves many problems with obscure `ggplot`. I didn't know it silently treats any variable as numeric unless you pass it `as.factor`, even if passed to a `group=` param. What a mess. – ivan866 Feb 28 '20 at 13:23
  • This is still a very valuable hint. Despite using `group=` the plot did not show the desired result. Using `factor()`solved the issue immediately. – Dr_Be Nov 23 '22 at 10:02