I'm trying to overlay convex hulls on-top of noisy data. I only want the hulls on the main clusters (and not the red outliers).
How would I plot every other hull while plotting all the points?
In my workaround attempt, I accidentally made all the outliers disappeared. Additionally, the shapes (21-25) were distorted when plotting my actual data in a Shiny app.
Would this be solved by tinkering when building the stat, or by editing the mapping?
Requirements: I'd also like to keep things to ggplot2
since this will all be wrapped up in a ShinyApp, and the hulls will be plotted if the user clicks on a checkbox. The number of cluster varies per graph, but the first one is ALWAYS the outlier.
Data Generation
library(dbscan)
library(ggplot2)
data("DS3")
DS3_cl <- hdbscan(DS3, minPts = 25)
DS3_comb <- DS3
DS3_comb$cluster <- as.character(DS3_cl$cluster)
Graph functions/parameters
cols <- c('#e41a1c','#377eb8','#4daf4a','#984ea3','#ff7f00','#a65628','#f781bf','#999999','#ffff33')
cols_2 <- c(NA,'#377eb8','#4daf4a','#984ea3','#ff7f00','#a65628','#f781bf','#999999','#ffff33')
StatChull <- ggproto("StatChull", Stat,
compute_group = function(data, scales) {
data[chull(data$x, data$y), , drop = FALSE]
},
# Do the outlier removal around here?
required_aes = c("x", "y")
)
stat_chull <- function(mapping = NULL, data = NULL, geom = "polygon",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, ...) {
layer(
stat = StatChull, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
Plots:
plot_1 <- ggplot(data = DS3_comb, aes(X, Y, color = cluster)) +
geom_point(alpha = 0.4) +
scale_color_manual(values = cols) +
theme_bw()
plot_2 <- plot_1 + stat_chull(fill = NA)
Workaround Attempt:
plot_3 <- ggplot(data = DS3_comb, aes(X,Y, fill = cluster, color = cluster)) +
geom_point(shape = 21, alpha = 0.4) +
scale_fill_manual(values = cols) +
scale_color_manual(values = cols_2) +
theme_bw()
Sources consulted:
- R : pass Graph as parameter to a function
- https://cran.r-project.org/web/packages/ggplot2/vignettes/extending-ggplot2.html
- stat_function and legends: create plot with two separate colour legends mapped to different variables
- Just started reading, but could be problematic with Shiny: https://github.com/rstudio/shiny/issues/1431