2D summary plot with counts as labels

Question

I have measurements of a quantity (value) at specific points (lon and lat), like the example data below:

library(ggplot2)
set.seed(1)
dat <- data.frame(lon = runif(1000, 1, 15), 
                  lat = runif(1000, 40, 60), 
                  value = rnorm(1000))

I want to make a 2D summary (e.g. mean) of the measured values with color in space and on top of that I want to show the counts as labels.

I can plot the labels and to the summary plot

## Left plot
ggplot(dat) +
  aes(x = lon, y = lat, z = value) +
  stat_summary_hex(bins = 5, fun = "mean", geom = "hex")
## Right plot
ggplot(dat) +
  aes(x = lon, y = lat, z = value) +
  stat_binhex(aes(label = ..count..), bins = 5, geom = "text")

But when I combine both I loose the summary:

ggplot(dat) +
  aes(x = lon, y = lat, z = value) +
  stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
  stat_binhex(aes(label = ..count..), bins = 5, geom = "text")

I can achieve the opposite, counts as color and summary as labels:

ggplot(dat, aes(lon, lat, z = value)) +
  geom_hex(bins = 5) +
  stat_summary_hex(aes(label=..value..), bins = 5, 
                   fun = function(x) round(mean(x), 3), 
                   geom = "text")

score 6 · Accepted Answer · answered Feb 08 '22 at 22:09

While writing the question, which took some hours of testing, I found a solution: adding a fill=NULL, or fill=mean(value) in the text one gives me what I want. Below the code and their resulting plots; the only difference is the label of the legend.

But it feels very hacky, so I would appreciate a better solution.

ggplot(dat) +
  aes(x = lon, y = lat, z = value)  +
  stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
  stat_binhex(aes(label = ..count.., fill = NULL), bins = 5, geom = "text") +
  theme_bw()



ggplot(dat) +
  aes(x = lon, y = lat, z = value)  +
  stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
  stat_binhex(aes(label = ..count.., fill = mean(value)), bins = 5, geom = "text") +
  theme_bw()

I'd argue using `fill = NULL` is the "correct" answer here. `StatBinhex` has a default aesthetic mapping of `fill = after_stat(count)` (which then merges the range with that of `stat_summary_hex()` for drawing the guide, as others have said). By specifying `fill = NULL` you're explicitly removing that mapping. — Mikko Marttila, Feb 17 '22 at 22:24

score 2 · Answer 2 · answered Feb 20 '22 at 19:54

I propose a completely different approach to this problem. However, it needs to be clarified a bit first. You write "I have measurements of a quantity (value) at specific points (lon and lat)" but you do not specify these points exactly. Your data (generated) contains 1000 lon points and the same number of lat points.

Anyway, see for yourself.

library(tidyverse)

set.seed(1)
dat <- 
  tibble(
    lon = runif(1000, 1, 15), 
    lat = runif(1000, 40, 60), 
    value = rnorm(1000)
  ) 

dat %>% distinct(lon) %>% nrow() #1000
dat %>% distinct(lat) %>% nrow() #1000

My guess is that for real data you have a much smaller set of values for lon and lat. Let me break it down to an accuracy of 2.

grid = 2

dat %>% mutate(
    lon = round(lon/grid)*grid,
    lat = round(lat/grid)*grid,
  ) %>% 
  group_by(lon, lat) %>% 
  summarise(
    mean = mean(value),
    label = n()
  )

As you can see after rounding, the data was grouped according to these two variables and then I calculated the statistics you are interested in (mean and number of observations).

Also note that these statistics are generated at the intersection of lon and lat, so we have a square grid. In your solution, this is not the case at all. You are not getting the number of observations at these points and your grid is not square.

So let's make a graph.

dat %>% ggplot(aes(lon,lat,z=mean)) + 
  geom_contour_filled(binwidth = 0.25) + 
  geom_text(aes(label = label)) + 
  theme_bw()

Nothing stands in the way of increasing your grid a bit, let's say 4.

grid = 4

datg = dat %>% mutate(
  lon = round(lon/grid)*grid,
  lat = round(lat/grid)*grid,
) %>% 
  group_by(lon, lat) %>% 
  summarise(
    mean = mean(value),
    label = n()
  )

datg %>% ggplot(aes(lon,lat,z=mean)) + 
  geom_contour_filled(binwidth = 0.25) + 
  geom_text(aes(label = label)) + 
  theme_bw()

Using such a solution, we can easily supplement the labels in the points of interest to us, e.g. with the average value. This time we will use grid = 1.5.

grid = 1.5

datg = dat %>% mutate(
  lon = round(lon/grid)*grid,
  lat = round(lat/grid)*grid,
) %>% 
  group_by(lon, lat) %>% 
  summarise(
    mean = mean(value),
    label = n(),
    lab2 = paste0("(", round(mean, 2), ")")
  )

datg %>% ggplot(aes(lon,lat,z=mean)) + 
  geom_contour_filled(binwidth = 0.25) + 
  geom_text(aes(label = label)) + 
  geom_text(aes(label = lab2), nudge_y = -.5, size = 3) + 
  theme_bw()

Hope this solution fits your needs much better than the stat_binhex based solution.

Waldi · Answer 3 · 2022-02-14T20:58:32.350

1

The problem here is that both plots share the same legend scale.

As the scales ranges are different : 0-40 vs -1.5 - 0.5, the biggest range makes values of the smallest range appear with (almost) the same color.

This is why displaying count as color works, but the opposite doesn't seem to work.

As an illustration, if you rescale the mean calculation, colors variations are visible:

  rescaled_mean <- function(x) mean(x)*40
 
   ggplot(dat) +
    aes(x = lon, y = lat, z = value)  +
    stat_summary_hex(bins = 5, fun = "rescaled_mean", geom = "hex")+
    stat_binhex(aes(label = ..count..), bins = 5, geom = "text") +
    theme_bw()

edited Feb 14 '22 at 20:58

answered Feb 14 '22 at 19:32

Waldi

39,242
6
30
78

Thanks for the reply, but the resulting plot is not what I need. I don't want to see the ranges of a rescaled mean in the legend. Also finding the arbitrary `40` is not a good approach as each different data-set will have a different magic number - setting `fill=mean(value)` or `fill==NULL` works much better. – alko989 Feb 14 '22 at 20:24
This was just an illustration of the reason for the problem you have, and in no way a generic solution. I find your solutions are a good compromise because AFAIK `ggplot` isn't that easy with management of 2 scales on the same graph, see https://stackoverflow.com/a/3101876/13513328 – Waldi Feb 14 '22 at 22:00

tjebo · Answer 4 · 2022-02-17T21:46:39.047

To be fair, I find this a very strange behaviour. I like your solution though - I really don't find it very hacky to add fill = NULL. In contrary, I find this very elegant. Here a more hacky approach, basically resulting the same, but with one more line. It's using ggnewscale.

library(ggplot2)
set.seed(1)
dat <- data.frame(lon = runif(1000, 1, 15), 
                  lat = runif(1000, 40, 60), 
                  value = rnorm(1000))
ggplot(dat) +
  aes(x = lon, y = lat,z = value) +
  stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
  ggnewscale::new_scale_fill() +
  stat_binhex(aes(label = ..count..), bins = 5, geom = "text")

^{Created on 2022-02-17 by the reprex package (v2.0.1)}

Thanks for that, I also find it strange that a text geom would have a fill scale. — alko989, Feb 18 '22 at 09:12

2D summary plot with counts as labels

4 Answers4

Linked