1

I'm using ggplot() to build two scatter plots that plot health assessment score for (1) male and (2) female patients vs. # weeks in treatment, plus I'm using geom_line() to plot regression line overlays for both the males and females on each graph.

My question: How do I match the colors of the line overlays with the colors of the scatter plot points ('steelblue2' and 'pink3') while still retaining the legend?

I've found if I move color outside of the aes() argument in geom_line(), the colors of the lines and scatterplot points match, but then the legend disappears.

My code & a sample from my data:

mean_behav_by_numweeks = data.frame(
  numweeks_round = c(1:10), 
  Mean_Behavior_Score_Male = c(3.32,4.18,3.82,4.06,3.33, 3.80,3.64,3.66,3.37,3.82), 
  nrow_male = c(396,323,293,259,226,217,202,190,170,167), 
  lm_results_predict_male = c(3.82,3.80,3.78,3.76,3.74, 3.72,3.70,3.68,3.66,3.64), 
  Mean_Behavior_Score_Female = c(2.91,3.79,3.65,3.41, 2.88,2.88,3.78,2.98,3.67,3.93), 
  nrow_female = c(109,82,72,74,66,60,58,56,52,50),
  lm_results_predict_female=c(3.44,3.44,3.45,3.45, 3.46,3.47,3.47,3.48,3.48,3.49))

gg_plot1 <- ggplot(mean_behav_by_numweeks, 
                   aes(numweeks_round, 
                       Mean_Behavior_Score_Male, 
                       size = mean_behav_by_numweeks$nrow_male)) + 
  geom_point(colour='steelblue2') +
  ggtitle(paste("Scatter plot of mean behavior assessment score by member by # weeks \n since 1st assessment for", 
                as.character(var), 
                "among Male Medi-Cal plan members")) + 
  theme(plot.title = element_text(size=10.9, hjust = 0.5)) + 
  theme(axis.text = element_text(size=8)) + 
  scale_size_continuous(range = c(1, 7)) +
  xlab("Number of weeks since 1st assessment") + 
  ylab("Mean behavior assessment score") + 
  theme(legend.position="bottom") + 
  labs(size="# members") +
  geom_line(data=mean_behav_by_numweeks, 
            aes(numweeks_round, lm_results_predict_male, color='steelblue2'), 
            size=1) +
  geom_line(data=mean_behav_by_numweeks, 
            aes(numweeks_round, lm_results_predict_female, color='pink3'), 
            size=1) +
  scale_color_discrete(name = "GenderCode", labels = c("Female", "Male")) + 
  theme(legend.position="bottom") + 
  guides(color = guide_legend(order=1, direction="vertical"))

gg_plot1


gg_plot2 <- ggplot(mean_behav_by_numweeks, 
                   aes(numweeks_round, 
                       Mean_Behavior_Score_Female, 
                       size = mean_behav_by_numweeks$nrow_female)) + 
  geom_point(colour='pink3') +
  ggtitle(paste("Scatter plot of mean behavior assessment score by member by # weeks \n since 1st assessment for", 
                as.character(var), 
                "among Female Medi-Cal plan members")) + 
  theme(plot.title = element_text(size=10.9, hjust = 0.5)) + 
  theme(axis.text = element_text(size=8)) + 
  scale_size_continuous(range = c(1, 7)) +
  xlab("Number of weeks since 1st assessment") + 
  ylab("Mean behavior assessment score") + 
  theme(legend.position="bottom") + 
  labs(size="# members") +
  geom_line(data=mean_behav_by_numweeks, 
            aes(numweeks_round, lm_results_predict_male, color='steelblue2'), 
            size=1) +
  geom_line(data=mean_behav_by_numweeks, 
            aes(numweeks_round, lm_results_predict_female, color='pink3'), size=1) +
  scale_color_discrete(name = "GenderCode", labels = c("Female", "Male")) + 
  theme(legend.position="bottom") + 
  guides(color = guide_legend(order=1, direction="vertical"))

windows()
gg_plot2
Z.Lin
  • 28,055
  • 6
  • 54
  • 94
RobertF
  • 824
  • 2
  • 14
  • 40
  • 2
    This is not how you assign colors. The best way here is to `melt` or `gather` your data into long format, creating a gender column. Then you only add one `geom_line` layer, mapping the `color` aesthetic to the gender column – Jack Brookes May 11 '18 at 18:04
  • 1
    Lots of other similar questions e.g. https://stackoverflow.com/questions/49151314/add-legend-to-geom-density-r/49151529#49151529, https://stackoverflow.com/questions/10349206/add-legend-to-ggplot2-line-plot – Jack Brookes May 11 '18 at 18:09
  • @JackBrookes Ugh, the melt function - was hoping to avoid that... – RobertF May 11 '18 at 18:39
  • It will save you lots of time, you probably need to melt much earlier before you do your `lm`, etc. I expect you have to do everything twice since your columns are separated by gender – Jack Brookes May 11 '18 at 18:41
  • @JackBrookes Understood, thanks. – RobertF May 11 '18 at 18:53

2 Answers2

2

You will want to reshape your data into long format, although you don't have to use melt or gather if you don't want to -- you can stack your data manually, like

library(dplyr)
library(ggplot2)

new_df <- bind_rows(
  male = select(mean_behav_by_numweeks,
                numweeks_round,
                Mean_Behavior_Score = Mean_Behavior_Score_Male,
                nrow = nrow_male,
                lm_results_predict = lm_results_predict_male),
  female = select(mean_behav_by_numweeks,
                numweeks_round,
                Mean_Behavior_Score = Mean_Behavior_Score_Female,
                nrow = nrow_female,
                lm_results_predict = lm_results_predict_female),
  .id = "gender"
)

Then you can just do

ggplot(new_df, aes(numweeks_round, Mean_Behavior_Score, size = nrow, colour = gender)) + 
    geom_point() +
    theme(plot.title = element_text(size=10.9, hjust = 0.5),
          axis.text = element_text(size=8),
          legend.position="bottom") + 
    scale_size_continuous(range = c(1, 7)) +
    labs(x = "Number of weeks since 1st assessment",
         y = "Mean behavior assessment score",
         size="# members") +
    geom_line(aes(y = lm_results_predict), size = 1) +
    scale_color_manual(name = "GenderCode", labels = c("Female", "Male"), values = c("pink3", "steelblue2")) + 
    guides(color = guide_legend(order=1, direction="vertical")) +
    facet_wrap("gender")

which gives you

enter image description here

Weihuang Wong
  • 12,868
  • 2
  • 27
  • 48
  • Beautiful, thank you! This is a good opportunity for me to practice both manual stacking and melting/casting. – RobertF May 11 '18 at 19:08
1

One can use gather/separate to first convert data in long format and then plot.

# A simple capitalization function to convert first letter in Caps 
# This function is used to convert male/female to Male/Female
.simpleCap <- function(x) {
  s <- strsplit(x, " ")[[1]]
  paste(toupper(substring(s, 1, 1)), substring(s, 2),
        sep = "", collapse = " ")
}


library(tidyverse)
df <- mean_behav_by_numweeks %>% 
  gather(key, value, - numweeks_round) %>%
  separate(key, c("key", "GenderCode"), sep = "_(?=[^_]*?$)") %>% #separates on last _
  mutate(GenderCode = mapply(.simpleCap,GenderCode)) %>%
  spread(key, value)

Plot the graph:

ggplot(df, aes(numweeks_round, Mean_Behavior_Score, size = nrow, color = GenderCode )) +
  geom_point() + 
  geom_line(aes(y = lm_results_predict, color = GenderCode), size = 1) +
  theme(plot.title = element_text(size=10.9, hjust = 0.5),
        axis.text = element_text(size=8),
        legend.position="bottom")  + 
  labs(x = "Number of weeks since 1st assessment",
       y = "Mean behavior assessment score",
       size="# members") +
  guides(color = guide_legend(order=1, direction="vertical")) 

enter image description here

Data:

mean_behav_by_numweeks = data.frame(
  numweeks_round = c(1:10), 
  Mean_Behavior_Score_Male = c(3.32,4.18,3.82,4.06,3.33, 3.80,3.64,3.66,3.37,3.82), 
  nrow_male = c(396,323,293,259,226,217,202,190,170,167), 
  lm_results_predict_male = c(3.82,3.80,3.78,3.76,3.74, 3.72,3.70,3.68,3.66,3.64), 
  Mean_Behavior_Score_Female = c(2.91,3.79,3.65,3.41, 2.88,2.88,3.78,2.98,3.67,3.93), 
  nrow_female = c(109,82,72,74,66,60,58,56,52,50),
  lm_results_predict_female=c(3.44,3.44,3.45,3.45, 3.46,3.47,3.47,3.48,3.48,3.49))
MKR
  • 19,739
  • 4
  • 23
  • 33