In my Shiny data quality assessment application, users upload datasets and can generate reports using RMarkdown. One of those reports is a visual summary of the dataset, rating each variable of the dataset on three metrics: % Compliance (how well the data match the data model), % Logic (is death date after birth date?), and % Completeness. Each patient in the dataset belongs to a GROUP, and the metrics are reported by GROUP. Each dataset includes multiple data tables, and so a separate grid of heatmaps (on a separate page) is created for each data table. Some users have lots of GROUPs in their data, while others have only a few. Similarly, the number of variables in a table vary as well.
How can I standardize the look of these faceted heatmaps across these different circumstances? I am looking for a way to specify the height and width in cm for each heatmap cell, but haven't found a good solution. The two plots below illustrate the problem.
1. Faceted Heatmap with small number of patient groups in dataset (cells are too big!):
Using coord_fixed()
doesn't correct the issue nor does coord_equal()
. It still results in large cells for the first scenario and even tinier ones for the second.
Below is a reproducible example to generate fake data and the resulting faceted heatmaps for a variety of numbers of variables and patient groups. (e.g., makeWrappedHeatmaps(numGroups = 20, numVars = 5)
will create faceted set of 20 heatmaps summarizing metrics for 5 variables)
library(tidyverse)
library(data.table)
createFakeVariableMetrics <- function(varName, numGroups){
tibble(GROUP = paste("Group", LETTERS[1:numGroups]),
Variable = rep(varName, numGroups),
Compliant = round(runif(numGroups, 20, 100), 0),
Logic = round(runif(numGroups, 0, 100), 0),
Complete = round(runif(numGroups, 25, 100), 0)
)
}
# example possible variable names
variableNames <- c("PATIENT", "BIRTH_D", "MED_CODE", "DROP_Y", "DX_D", "VIS_D", "DX_CODE", "RS_CODE", "LAB_V")
makeWrappedHeatmaps <- function(numGroups, numVars){
if (numVars > length(variableNames)) return("Specify a number of variables less than 10")
varNames <- variableNames[1:numVars]
dfList <- lapply(varNames, createFakeVariableMetrics, numGroups)
metricdf <- rbindlist(dfList)
forHeatmap <- gather(metricdf, c("Compliant", "Logic", "Complete"), key = "Metric", value = "percent")
ggplot(data = forHeatmap, aes(x = Metric, y = Variable)) +
geom_tile(aes(fill = percent), colour = "white", size = 0.01) +
facet_wrap("GROUP") +
labs(x="",y="") +
geom_text(aes(label = percent), size = 3) +
scale_y_discrete(expand=c(0,0))+
scale_x_discrete(expand=c(0,0), position = "top") +
theme(
axis.text.x = element_text(size = 8),
plot.background = element_blank(),
axis.ticks.y = element_blank(),
axis.ticks.x = element_blank(),
panel.border = element_blank(),
legend.position = "none")
}
Thank you!