3

I'm creating a visualization of missing data by slightly tweaking some of the code from the missmap function in the Amelia package. I want to draw borders around my rectangles, but I can't figure out a way to do that in ggplot2.

I found the function "borders()" but that appears to be related to map work. I also tried using geom_rect, but it seems like that would require me to specify mins and maxes. Geom_raster seems to be doing exactly what I need, but I can't figure out how to specify borders.

This example code creates the visualization that I'm imagining, but I have more variables in the "real" version and I'd like to be able to outline each variable (var1, var2, etc.) with a line (border).

#Dataset
missmap_data_test <- data.frame(var1 = c(11, 26, NA, NA, 15),
                                var2 = c(NA, NA, 0, NA, 1))

#Create Function
ggplot_missing <- 

function(x){
  x %>% 
    is.na %>%
    melt %>%
    ggplot(data = .,
           aes(x = Var2,
               y = Var1)) +
    geom_raster(aes(fill = value)) +
    scale_fill_grey(name = "",
                    labels = c("Present","Missing")) +
    theme_minimal() + 
    theme(axis.text.x  = element_text(angle=90, hjust=1)) + 
    labs(x = "Variables in Dataset",
         y = "Observations")
}

#Feed the function my new data
ggplot_missing(missmap_data_test)
Emily Bovee
  • 33
  • 1
  • 4

1 Answers1

8

As @Axeman suggests, geom_tile does the job. I've updated your code to give an example below. Here, colour defines the colour of the border, while size define the thickness.

#Dataset
missmap_data_test <- data.frame(var1 = c(11, 26, NA, NA, 15),
                                var2 = c(NA, NA, 0, NA, 1))

# Load libraries
library(dplyr)
library(ggplot2)
library(reshape2)

#Create Function
ggplot_missing <- function(x){
    x %>% 
      is.na %>%
      melt %>%
      ggplot(data = .,
             aes(x = Var2,
                 y = Var1)) +
      geom_tile(aes(fill = value), colour = "#FF3300", size = 2) +
      scale_fill_grey(name = "",
                      labels = c("Present","Missing")) +
      theme_minimal() + 
      theme(axis.text.x  = element_text(angle=90, hjust=1)) + 
      labs(x = "Variables in Dataset",
           y = "Observations")
  }

#Feed the function my new data
ggplot_missing(missmap_data_test)

Created on 2019-05-30 by the reprex package (v0.3.0)

If you're getting notches in the top left corner (discussed here and apparent in the plot above), you may want to update to the development version of ggplot2. That is, devtools::install_github("tidyverse/ggplot2"). For example, compare the plot above with the plot below:


Update

I assume this is a toy example, so I've tried to come up with a generic solution. Here, I've created a function called boxy that will make a data frame for geom_rect based on the original data frame.

#Dataset
missmap_data_test <- data.frame(var1 = c(11, 26, NA, NA, 15),
                                var2 = c(NA, NA, 0, NA, 1))

# Function for making box data frame
boxy <- function(df){
  data.frame(xmin = seq(0.5, ncol(df) - 0.5),
             xmax = seq(1.5, ncol(df) + 0.5),
             ymin = 0.5, ymax = nrow(df) + 0.5)
}

# Load libraries
library(dplyr)
library(ggplot2)
library(reshape2)

#Create Function
ggplot_missing <- function(x){
  df_box <- boxy(x)
  df_rast <- x %>% is.na %>% melt
     
  ggplot() +
  geom_raster(data = df_rast,
              aes(x = Var2,
                  y = Var1,
                  fill = value)) +
  geom_rect(data = df_box, 
            aes(xmin = xmin, xmax = xmax,
                ymin = ymin, ymax = ymax),
            colour = "#FF3300", fill = NA, size = 3) + 
  scale_fill_grey(name = "",
                  labels = c("Present","Missing")) +
  theme_minimal() + 
  theme(axis.text.x  = element_text(angle = 90, hjust = 1)) + 
  labs(x = "Variables in Dataset",
       y = "Observations")
}

#Feed the function my new data
ggplot_missing(missmap_data_test)

Created on 2019-05-30 by the reprex package (v0.3.0)

If you add a third variable (i.e., column) to your data frame, you get something like this:

Community
  • 1
  • 1
Dan
  • 11,370
  • 4
  • 43
  • 68
  • Thank you @Lyngbakr ! This is very close to what I'm after. However, I don't want the lines around each cell, but rather around each variable. Is there a way to do that? In other words, one large red rectangle around the var1 "column" and a similar large red rectangle around the var2 "column," but no horizontal lines between the top and bottom of the viz. – Emily Bovee May 30 '19 at 15:46
  • @EmilyBovee Sorry, I misunderstood. I've updated my answer. – Dan May 30 '19 at 16:30
  • That is so great. Thank you so much!! I'll give it a try. – Emily Bovee May 31 '19 at 14:07