6

I have measurements of a quantity (value) at specific points (lon and lat), like the example data below:

library(ggplot2)
set.seed(1)
dat <- data.frame(lon = runif(1000, 1, 15), 
                  lat = runif(1000, 40, 60), 
                  value = rnorm(1000))

I want to make a 2D summary (e.g. mean) of the measured values with color in space and on top of that I want to show the counts as labels.

I can plot the labels and to the summary plot

## Left plot
ggplot(dat) +
  aes(x = lon, y = lat, z = value) +
  stat_summary_hex(bins = 5, fun = "mean", geom = "hex")
## Right plot
ggplot(dat) +
  aes(x = lon, y = lat, z = value) +
  stat_binhex(aes(label = ..count..), bins = 5, geom = "text")

enter image description here

But when I combine both I loose the summary:

ggplot(dat) +
  aes(x = lon, y = lat, z = value) +
  stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
  stat_binhex(aes(label = ..count..), bins = 5, geom = "text")

enter image description here

I can achieve the opposite, counts as color and summary as labels:

ggplot(dat, aes(lon, lat, z = value)) +
  geom_hex(bins = 5) +
  stat_summary_hex(aes(label=..value..), bins = 5, 
                   fun = function(x) round(mean(x), 3), 
                   geom = "text")

enter image description here

alko989
  • 7,688
  • 5
  • 39
  • 62

4 Answers4

6

While writing the question, which took some hours of testing, I found a solution: adding a fill=NULL, or fill=mean(value) in the text one gives me what I want. Below the code and their resulting plots; the only difference is the label of the legend.

But it feels very hacky, so I would appreciate a better solution.

ggplot(dat) +
  aes(x = lon, y = lat, z = value)  +
  stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
  stat_binhex(aes(label = ..count.., fill = NULL), bins = 5, geom = "text") +
  theme_bw()



ggplot(dat) +
  aes(x = lon, y = lat, z = value)  +
  stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
  stat_binhex(aes(label = ..count.., fill = mean(value)), bins = 5, geom = "text") +
  theme_bw()

enter image description here

alko989
  • 7,688
  • 5
  • 39
  • 62
  • 2
    I'd argue using `fill = NULL` is the "correct" answer here. `StatBinhex` has a default aesthetic mapping of `fill = after_stat(count)` (which then merges the range with that of `stat_summary_hex()` for drawing the guide, as others have said). By specifying `fill = NULL` you're explicitly removing that mapping. – Mikko Marttila Feb 17 '22 at 22:24
2

I propose a completely different approach to this problem. However, it needs to be clarified a bit first. You write "I have measurements of a quantity (value) at specific points (lon and lat)" but you do not specify these points exactly. Your data (generated) contains 1000 lon points and the same number of lat points.

Anyway, see for yourself.

library(tidyverse)

set.seed(1)
dat <- 
  tibble(
    lon = runif(1000, 1, 15), 
    lat = runif(1000, 40, 60), 
    value = rnorm(1000)
  ) 

dat %>% distinct(lon) %>% nrow() #1000
dat %>% distinct(lat) %>% nrow() #1000

My guess is that for real data you have a much smaller set of values for lon and lat. Let me break it down to an accuracy of 2.

grid = 2

dat %>% mutate(
    lon = round(lon/grid)*grid,
    lat = round(lat/grid)*grid,
  ) %>% 
  group_by(lon, lat) %>% 
  summarise(
    mean = mean(value),
    label = n()
  )

As you can see after rounding, the data was grouped according to these two variables and then I calculated the statistics you are interested in (mean and number of observations).

Also note that these statistics are generated at the intersection of lon and lat, so we have a square grid. In your solution, this is not the case at all. You are not getting the number of observations at these points and your grid is not square.

So let's make a graph.

dat %>% ggplot(aes(lon,lat,z=mean)) + 
  geom_contour_filled(binwidth = 0.25) + 
  geom_text(aes(label = label)) + 
  theme_bw()

enter image description here

Nothing stands in the way of increasing your grid a bit, let's say 4.

grid = 4

datg = dat %>% mutate(
  lon = round(lon/grid)*grid,
  lat = round(lat/grid)*grid,
) %>% 
  group_by(lon, lat) %>% 
  summarise(
    mean = mean(value),
    label = n()
  )

datg %>% ggplot(aes(lon,lat,z=mean)) + 
  geom_contour_filled(binwidth = 0.25) + 
  geom_text(aes(label = label)) + 
  theme_bw()

enter image description here

Using such a solution, we can easily supplement the labels in the points of interest to us, e.g. with the average value. This time we will use grid = 1.5.

grid = 1.5

datg = dat %>% mutate(
  lon = round(lon/grid)*grid,
  lat = round(lat/grid)*grid,
) %>% 
  group_by(lon, lat) %>% 
  summarise(
    mean = mean(value),
    label = n(),
    lab2 = paste0("(", round(mean, 2), ")")
  )

datg %>% ggplot(aes(lon,lat,z=mean)) + 
  geom_contour_filled(binwidth = 0.25) + 
  geom_text(aes(label = label)) + 
  geom_text(aes(label = lab2), nudge_y = -.5, size = 3) + 
  theme_bw()

enter image description here

Hope this solution fits your needs much better than the stat_binhex based solution.

Marek Fiołka
  • 4,825
  • 1
  • 5
  • 20
1

The problem here is that both plots share the same legend scale.

As the scales ranges are different : 0-40 vs -1.5 - 0.5, the biggest range makes values of the smallest range appear with (almost) the same color.

This is why displaying count as color works, but the opposite doesn't seem to work.

As an illustration, if you rescale the mean calculation, colors variations are visible:

  rescaled_mean <- function(x) mean(x)*40
 
   ggplot(dat) +
    aes(x = lon, y = lat, z = value)  +
    stat_summary_hex(bins = 5, fun = "rescaled_mean", geom = "hex")+
    stat_binhex(aes(label = ..count..), bins = 5, geom = "text") +
    theme_bw()   

enter image description here

Waldi
  • 39,242
  • 6
  • 30
  • 78
  • Thanks for the reply, but the resulting plot is not what I need. I don't want to see the ranges of a rescaled mean in the legend. Also finding the arbitrary `40` is not a good approach as each different data-set will have a different magic number - setting `fill=mean(value)` or `fill==NULL` works much better. – alko989 Feb 14 '22 at 20:24
  • This was just an illustration of the reason for the problem you have, and in no way a generic solution. I find your solutions are a good compromise because AFAIK `ggplot` isn't that easy with management of 2 scales on the same graph, see https://stackoverflow.com/a/3101876/13513328 – Waldi Feb 14 '22 at 22:00
1

To be fair, I find this a very strange behaviour. I like your solution though - I really don't find it very hacky to add fill = NULL. In contrary, I find this very elegant. Here a more hacky approach, basically resulting the same, but with one more line. It's using ggnewscale.

library(ggplot2)
set.seed(1)
dat <- data.frame(lon = runif(1000, 1, 15), 
                  lat = runif(1000, 40, 60), 
                  value = rnorm(1000))
ggplot(dat) +
  aes(x = lon, y = lat,z = value) +
  stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
  ggnewscale::new_scale_fill() +
  stat_binhex(aes(label = ..count..), bins = 5, geom = "text")

Created on 2022-02-17 by the reprex package (v2.0.1)

tjebo
  • 21,977
  • 7
  • 58
  • 94