1

I'm trying to create a simple point estimate with confidence interval plot. I can get it to plot as I'd like until I try to change the point shape and/or the color. When I try to change either I get "Warning: Removed 4 rows containing missing values (geom_point)." and end up with a blank plot.

I've checked out and tried the suggestions on: here here here and here and a couple other places but to no avail.

A Reproducible Example

library(ggplot2)
set.seed(1)

# Create some sample data 

point_est <- 4:1
se        <- runif(4)

df <- data.frame(point_est = point_est,
                 se        = se,
                 lower     = point_est - se,
                 upper     = point_est + se,
                 year      = c("c", "c", "p", "p"),
                 group     = letters[1:4])

group_names <- paste0("Display Name for \n Group ", LETTERS[1:4])
names(group_names) <- letters[1:4]

legend_text <- c("Previous Year Rate with 95% Confidence Intervals",
                 "Current Year Rate with 95% Confidence Intervals")
names(legend_text) <- c("p", "c")

df$year = factor(df$year, levels = names(legend_text), labels = legend_text)
df$group = factor(df$group, levels = names(group_names), labels = group_names)

# Plot looks good except the colors and shape of the points need changing
ggplot(df, aes(x = group, y = point_est, color = year, label= year, shape = year)) +
  geom_errorbar(aes(ymin=lower, ymax=upper), width=.3) +
  geom_point(size = 3.2) +
  scale_x_discrete(drop=FALSE) +
  scale_y_continuous(sec.axis = sec_axis(~.*3, name = "This is my Right Axis")) +
  labs(x = NULL,
       y = "This is my Left Axis") +
  theme(legend.title = element_blank(),
        legend.position = "bottom",
        legend.background = element_blank(),
        legend.box.background = element_rect(colour = "black"),
        panel.border = element_rect(colour = "black", fill=NA),
        panel.background = element_blank()) 

# now change the shapes of the points and the colors of the error bars
shapes <- c(17, 15)
names(shapes) <- names(legend_text)

colors <- c("pink", "blue")
names(colors) <- names(legend_text)

ggplot(df, aes(x = group, y = point_est, color = year, label= year, shape = year)) +
  geom_errorbar(aes(ymin=lower, ymax=upper), width=.3) +
  geom_point(size = 3.2) +
  scale_x_discrete(drop=FALSE) +
  scale_y_continuous(sec.axis = sec_axis(~.*3, name = "This is my Right Axis")) +
  scale_shape_manual(values = shapes) +
  scale_color_manual(values = colors) +
  labs(x = NULL,
       y = "This is my Left Axis") +
  theme(legend.title = element_blank(),
        legend.position = "bottom",
        legend.background = element_blank(),
        legend.box.background = element_rect(colour = "black"),
        panel.border = element_rect(colour = "black", fill=NA),
        panel.background = element_blank()) 
#> Warning: Removed 4 rows containing missing values (geom_point).

# Blank plot now and warnings:(

2 Answers2

1

If you put the vectors directly into the ggplot it will work. For scale_shape_manual put c(17,15) for the values and for scale_color_manual put c("Pink","Blue") for the values. Or just do not assign names to the shapes and colors vectors. That is what it is throwing it off.

ggplot(df, aes(x = group, y = point_est, color = year, label= year, shape = year)) +
  geom_errorbar(aes(ymin=lower, ymax=upper), width=.3) +
  geom_point(size = 3.2) +
  scale_x_discrete(drop=FALSE) +
  scale_y_continuous(sec.axis = sec_axis(~.*3, name = "This is my Right Axis")) +
  scale_shape_manual(values = c(17, 15)) +
  scale_color_manual(values = c("pink", "blue")) +
  labs(x = NULL,
       y = "This is my Left Axis") +
  theme(legend.title = element_blank(),
        legend.position = "bottom",
        legend.background = element_blank(),
        legend.box.background = element_rect(colour = "black"),
        panel.border = element_rect(colour = "black", fill=NA),
        panel.background = element_blank()) 



######if you want to use the vectors do not name them 
shapes <- c(17, 15)

colors <- c("pink", "blue")

ggplot(df, aes(x = group, y = point_est, color = year, label= year, shape = year)) +
  geom_errorbar(aes(ymin=lower, ymax=upper), width=.3) +
  geom_point(size = 3.2) +
  scale_x_discrete(drop=FALSE) +
  scale_y_continuous(sec.axis = sec_axis(~.*3, name = "This is my Right Axis")) +
  scale_shape_manual(values = shapes) +
  scale_color_manual(values = colors) +
  labs(x = NULL,
       y = "This is my Left Axis") +
  theme(legend.title = element_blank(),
        legend.position = "bottom",
        legend.background = element_blank(),
        legend.box.background = element_rect(colour = "black"),
        panel.border = element_rect(colour = "black", fill=NA),
        panel.background = element_blank()) 
Mike
  • 3,797
  • 1
  • 11
  • 30
1

This is happening because you used names(legend_text) rather than legend_text as the names of your shapes and colors vectors. legend_text is what matches the values in the year column of your data. Do names(colors) <- legend_text and likewise for shapes and the plot will work. Nothing was plotted because the names of the colors and shapes vectors did not match any of the levels of df$year, so no colors or shapes were assigned for the actual values in year.

It looks like maybe you got tripped up by levels vs. labels in the factor function. By default, the levels are the existing set of unique values in the data and the labels are set equal to the levels. However, if you include a labels argument in factor, the data values get relabeled to be the values in the labels argument.

To make this concrete, note in the code below that the names of the shapes and colors vectors are p and c, which is different from the values in df$year.

> df[ , "year", drop=FALSE]
                                              year
1  Current Year Rate with 95% Confidence Intervals
2  Current Year Rate with 95% Confidence Intervals
3 Previous Year Rate with 95% Confidence Intervals
4 Previous Year Rate with 95% Confidence Intervals

> levels(df$year)
[1] "Previous Year Rate with 95% Confidence Intervals" "Current Year Rate with 95% Confidence Intervals"


> shapes
 p  c 
17 15 
> colors
     p      c 
"pink" "blue"
eipi10
  • 91,525
  • 24
  • 209
  • 285
  • and Mike Thanks for the help. This works with one exception on my actual data. Say df is now subset to just the current year data. The plot is correct except the legend only contains the current year. I'd like the legend to still show both levels and have all 4 groups even though 2 of the groups will be empty. – BrianDavisStats Mar 23 '18 at 21:22
  • You can add `drop=FALSE` to the `scale_***_manual` calls, just as with the `scale_x_discrete` call. – eipi10 Mar 23 '18 at 21:54
  • I was an idiot. I did that for the x axis categories but forgot that wasn't for the legend also. Thanks so much for the help. – BrianDavisStats Mar 23 '18 at 21:57