1

I am trying to code a plot using the data frame 'swiss' from {datasets} using {ggplot2}. I am plotting Infant.Mortality on the x-axis and Fertility on the y-axis, and I want the points to be colored such that they are a transparent blue or orange depending on if they are above or below the median value for Education. However, when I plot, I only get transparent blue points and the legend titles are off.

This is the code I have to far:

swiss$color[swiss$Education >= median(swiss$Education)] <- tBlue 
swiss$color[swiss$Education < median(swiss$Education)] <- tOrange

ggplot(data = swiss) + 
 geom_point(mapping = aes(x = Infant.Mortality, y = Fertility, color = color)) + 
 scale_color_manual(values = swiss$color,
                    labels = ">= median", "<median")

I've also tried what was explained in this question (ggplot geom_point() with colors based on specific, discrete values) but I couldn't get it to work.

I am very new to ggplot, so any advice is appreciated!! output

Phil
  • 7,287
  • 3
  • 36
  • 66
  • 2
    You can't add a column for values, you need unique pairs. `values = c('tBlue' = 'blue', 'tOrange' = 'orange')` should work. However, if it doesn't... It looks like you're new to SO; welcome to the community! If you want great answers quickly, it's best to make your question reproducible. This includes sample data like the output from `dput()` or `reprex::reprex()`. Check it out: [making R reproducible questions](https://stackoverflow.com/q/5963269). – Kat Oct 26 '22 at 23:18
  • This worked! tBlue and tOrange are colors I generated from a canned function, but I totally forgot to combine arguments with c(). Thank you! – Krystal Koski Oct 27 '22 at 01:32

1 Answers1

2

With ggplot we don't normally create column of color names (this is common in base graphics). Instead, the usual way is to create a column in your data with meaningful labels, like this:

swiss$edu_med = ifelse(swiss$Education >= median(swiss$Education), ">= Median", "< Median")

ggplot(data = swiss) + 
 geom_point(mapping = aes(x = Infant.Mortality, y = Fertility, color = edu_med)) + 
 scale_color_manual(values = c(tblue, torange))

The legend labels will be automatically generated from the data values.

It is possible to do it the way you have in the question, in this case use scale_color_identity(labels = ">= median", "< median") instead of scale_color_manual().

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294