I'm analysing some house sale transaction data, and I want to produce a geographic plot with the colour indicating average price per (hex-binned) region. Some regions have limited data, and I want to indicate this by adjusting the opacity to reflect the number of points in each region.
This would require me to calculate two statistics for each hex: average price and number of points. The ggplot2 package makes it very easy to calculate and plot one statistic in a chart, but I can't figure out how to calculate two.
To illustrate the point:
library(ggplot2)
N = 1000;
df_demo = data.frame(A=runif(N), B=runif(N), C=runif(N)) # dummy data
# I want to produce a hex-binned version of this:
ggplot(data=df_demo) + geom_point(mapping=aes(x=A, y=B, color=C))
# It's easy to get each hex's average price *or* its point density:
ggplot(data=df_demo) + stat_summary_hex(mapping=aes(x=A,y=B,z=C), fun=mean) # color = average of C across hex, but opacity can't be adjusted
ggplot(data=df_demo) + geom_hex(mapping=aes(x=A, y=B, color=C, alpha=..ndensity..)) # opacity = normalised # of points, but color is *total* value which is wrong
I would like to combine the effects of the last two lines, but that doesn't seem to be an option: the ..ndensity.. statistic doesn't work in the context of stat_summary_hex(), and geom_hex() won't calculate the mean value.
Is there a way to do this that I'm overlooking? Alternatively, is there an obvious way of precomputing the statistics needed before constructing the plot? E.g. by determining the expected hex for each datum during my dplyr pipeline.
One hint that there may not be an easy solution is this non-CRAN package which - if I've understood correctly - solves more or less this problem. However, I'd rather not rely on out-of-CRAN code if at all possible, so I'm holding onto hope that I've missed something obvious.