1

This question is a follow-up to my previous question: Adding color code (fill) to vis_miss plot

I would like to visualize the "missing info" in a data frame using geom_raster from ggplot2 in R while also highlighting some additional data structure using color-coding.

Solution attempt:

library(tidyverse)
x11()
airquality %>%
  mutate(id = row_number()) %>%
  gather(-c(id,Month), key = "key", value = "val") %>%
  mutate(isna = is.na(val)) %>%
  mutate(Month=as.factor(ifelse(isna==TRUE,NA,Month)))  %>%
  ggplot(aes(key, id, fill = Month)) +
    geom_raster() +
    labs(x = "Variable",
           y = "Row Number", title = "Missing values in rows") +
    coord_flip()

plot

This is almost what I want, but it would be nicer to separate the month and NA legends. Is that possible? (Note that my system does not allow me to use transparency (alpha)).

M--
  • 25,431
  • 8
  • 61
  • 93
user98563
  • 29
  • 5

1 Answers1

1

Here, I removed the legend for NA. If this doesn't serve your purpose properly, I can think of a hacky solution to add another legend for data vs. missing.

library(tidyverse)

airquality %>%
  mutate(id = row_number()) %>%
  gather(-c(id,Month), key = "key", value = "val") %>%
  mutate(isna = is.na(val)) %>%
  mutate(Month_Dummy=as.factor(ifelse(isna==TRUE,NA,Month)))  %>%
  mutate(Month=as.factor(Month))  %>% 
  ggplot() +
  geom_raster(aes(key, id, fill = Month)) +
  geom_raster(aes(key, id, fill = Month_Dummy)) +
  labs(x = "Variable",
       y = "Row Number", title = "Missing values in rows") +
  coord_flip()

Update:

The hacky solution that I can think of is adding a geom_point for just one of the missing and used that for the legend of missing data points. It's not the best in terms of appearance, but is the only solution I can think of.

library(tidyverse)

airquality %>%
  mutate(id = row_number()) %>%
  gather(-c(id,Month), key = "key", value = "val") %>%
  mutate(isna = is.na(val)) %>%
  mutate(Month_Dummy=as.factor(ifelse(isna==TRUE,NA,Month)))  %>%
  mutate(Month=as.factor(Month))  -> aqdf

ggplot(data = aqdf, aes(key, id)) +
  geom_raster(aes(fill = Month)) +
  geom_raster(aes(fill = Month_Dummy)) +
  geom_point(data=aqdf[aqdf$isna==TRUE,][1,], 
             aes(NA, id, colour = "NA"),
             inherit.aes = FALSE) +
  scale_color_manual(values=c("grey50")) +
  labs(x = "Variable", y = "Row Number", 
       title = "Missing values in rows", color = "Missing") +
  coord_flip() +
  theme(legend.key = element_rect(fill = "grey50")) 

Community
  • 1
  • 1
M--
  • 25,431
  • 8
  • 61
  • 93
  • This is better than the previous, but I would prefer to have a second legend for the NA square, similar to what `vis_miss` in the `visdat` package does. Would you use two boxes for the missing/available data, and if so how would you color the available data? – user98563 Sep 16 '19 at 21:12
  • This is fine. I added a line with theme(legend.key = element_rect(fill = "grey50")) at the end to make it look nicer. – user98563 Sep 16 '19 at 21:49
  • 1
    @user98563 nice one. I edited and used `NA` instead of `key` in `geom_point` to not have any actual points graphed (if you look closely at the graph resulted by previous code, you'd have seen the point). I guess this now should completely meet your needs. Hope to see you around this community actively. Cheers. – M-- Sep 16 '19 at 21:57