How do I make multiple plots of the same data but colored differently by different factors (columns) while recycling data? Is this what gridExtra
does differently than cowplot
?
Objective: My objective is to visually compare different results of clustering the same data efficiently. I currently believe the easiest way to compare 2-4 clustering algorithms visually is to have them plotted next to each other.
Thus, how do I plot the same data side by side colored differently?
Challenge/Specifications: Performance is very important. I have roughly 30,000 graphs to make, each with 450 - 480 points. It is critical that the data is "recycled."
I am able to plot them side by side using packages cowplot
and gridExtra
. I just started using gridExtra
today but it seems to recycle data and is better than cowplot for my purposes. Update: u/eipi10 demonstrated facet_wrap
could work if I gathered the columns before plotting.
Set up
#Packages
library(ggplot2)
library(cowplot)
library(gridExtra)
library(pryr) #memory profile
#Data creation
x.points <- c(1, 1, 1, 3, 3, 3, 5, 5, 5)
y.points <- c(1, 3, 5, 1, 3, 5, 1, 3, 5)
cl_vert <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
cl_hoz <- c("A", "B", "C", "A", "B", "C", "A", "B", "C")
cl_cent <- c("A","A","A","A", "B", "A","A","A","A")
df <- data.frame(x.points, y.points, cl_vert, cl_hoz, cl_cent)
Graphing them
#Graph function and individual plots
graph <- function(data = df, Title = "", color.by, legend.position = "none"){
ggplot(data, aes(x = `x.points`, y = `y.points`)) +
geom_point(aes(color = as.factor(color.by))) + scale_color_brewer(palette = "Set1") +
labs(subtitle = Title, x = "log(X)", y = "log(Y)", color = "Color" ) +
theme_bw() + theme(legend.position = legend.position)
}
g1 <- graph(Title = "Vertical", color.by = cl_vert)
g2 <- graph(Title = "Horizontal", color.by = cl_hoz)
g3 <- graph(Title = "Center", color.by = cl_cent)
#Cowplot
legend <- get_legend(graph(color.by = cl_vert, legend.position = "right")) #Not a memory waste
plot <- plot_grid(g1, g2, g3, labels = c("A", "B", "C"))
title <- ggdraw() + draw_label(paste0("Data Ex ", "1"), fontface = 'bold')
plot2 <- plot_grid(title, plot, ncol=1, rel_heights=c(0.1, 1)) # rel_heights values control title margins
plot3 <- plot_grid(plot2, legend, rel_widths = c(1, 0.3))
plot3
#gridExtra
plot_grid.ex <- grid.arrange(g1, g2, g3, ncol = 2, top = paste0("Data Ex ", "1"))
plot_grid.ex
Memory usage with pryr
#Comparison
object_size(plot_grid) #315 kB
object_size(plot3) #1.45 MB
#Individual objects
object_size(g1) #756 kB
object_size(g2) #756 kB
object_size(g3) #756 kB
object_size(g1, g2, g3) #888 kB
object_size(legend) #43.6 kB
Additional Questions:
After writing this question and providing sample data, I just remembered gridExtra
, tried it, and it seems to take up less memory than the combined data of its component graphs. I thought g1, g2, and g3 shared the same data except for the coloring assignment, which was why there was roughly 130 kB difference between the individual components and the total object size. How is it that plot_grid takes up even less space than that? ls.str(plot_grid)
doesn't seem to show any consolidation of g1, g2, and g3. Would my best bet be to use lineprof()
and run line by line comparisons?
Sources I've skimmed/read/consulted:
- http://adv-r.had.co.nz/memory.html #don't fully understand
- Add a common Legend for combined ggplots #to fix gridExtra later
Please bear with me as I am a new programmer (just truly started scripting December); I don't understand all the technical details yet but I want to.