1

I want to plot a bar for each x value and each group, where each bar is individually filled with a vertical gradient by the number of observations with a same y value.

For more context: I wish to explain how the number of alive sons g="a" or born sons g="b" affects the salary y of an individual. For that i first plotted this scatter plot:

ggplot(df, aes(y=y, x=x, color=g)) + 
  geom_point(position=position_dodge(0.2), alpha=0.3) +
  geom_smooth(method="lm", alpha=0.3)

enter image description here

As there is an alpha argument, places with lots of dots get darker, giving a sense of density. But i'd like to directly plot that density with something like:

enter image description here

I guess that this could be described as "a violin plot, but instead of larger figure we have a darker color". My question is how to get something like that.

The most important feature is the fill. The bars can have a non-fixed size, and if there's another geom that has this feature i'm probably good with it.

I came across this question (Create barplot in R with gradient based on density) were you'd create a rect and fill it with stat_ydensity, but in that case, he only had a x variable and not a group one, so i couldn't find how to apply it to my problem.

What I tried:

Changing the geom to "bar" but got this:

ggplot(df, aes(y=y, x=x, color=g)) +
  stat_ydensity(geom="bar", aes(fill=stat(density)))

enter image description here

Counting how many observations with the same x and g had a y value with

library(dplyr)
df <- df %>% group_by_all() %>% summarise(COUNT = n())

And then try to pass COUNT as the fill argument to some geom, but also didn't got good results.

Data:

df <- structure(list(y = c(1000, 1000, 510, 510, 650, 650, 510, 510, 
350, 350, 800, 800, 800, 800, 150, 150, 2000, 2000, 800, 800, 
600, 600, 1700, 1700, 600, 600, 600, 600, 400, 400, 1000, 1000, 
600, 600, 600, 600, 700, 700, 510, 510, 0, 0, 800, 800, 250, 
250, 730, 730, 800, 800, 2500, 2500, 750, 750, 800, 800, 2500, 
2500, 1500, 1500, 1000, 1000, 3800, 3800, 3600, 3600, 1000, 1000, 
320, 320, 1450, 1450, 600, 600, 800, 800, 1500, 1500, 1200, 1200, 
600, 600, 510, 510, 600, 600, 1050, 1050, 1300, 1300, 500, 500, 
2000, 2000, 750, 750, 1260, 1260, 1500, 1500, 510, 510, 510, 
510, 4000, 4000, 4000, 4000, 960, 960, 820, 820, 400, 400, 1600, 
1600, 510, 510, 1200, 1200, 600, 600, 900, 900, 1010, 1010, 3500, 
3500, 3500, 3500, 2700, 2700, 1100, 1100, 700, 700, 560, 560, 
1700, 1700, 470, 470, 1500, 1500, 2000, 2000, 7000, 7000, 2000, 
2000, 2000, 2000, 700, 700, 600, 600, 1200, 1200, 850, 850, 510, 
510, 600, 600, 2000, 2000, 1500, 1500, 1000, 1000, 1500, 1500, 
1500, 1500, 5000, 5000, 5000, 5000, 5000, 5000, 800, 800, 200, 
200, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 2000, 2000, 
420, 420, 250, 250, 510, 510, 600, 600, 1510, 1510, 1500, 1500, 
998, 998, 870, 870, 900, 900, 900, 900, 5500, 5500, 2800, 2800, 
2000, 2000, 510, 510, 905, 905, 1020, 1020, 1030, 1030, 1020, 
1020, 2040, 2040, 2800, 2800, 1800, 1800, 650, 650, 2000, 2000, 
700, 700, 800, 800, 900, 900, 1300, 1300, 600, 600, 400, 400, 
800, 800, 700, 700, 510, 510, 460, 460, 223, 223, 1000, 1000, 
600, 600, 510, 510, 510, 510, 720, 720, 650, 650, 800, 800, 410, 
410, 2000, 2000, 1020, 1020, 940, 940, 1040, 1040, 3000, 3000, 
3000, 3000, 1200, 1200, 1300, 1300, 600, 600, 600, 600, 2100, 
2100, 700, 700, 3300, 3300, 500, 500, 350, 350, 1200, 1200, 2200, 
2200, 800, 800, 700, 700, 1080, 1080, 200, 200, 1400, 1400, 2500, 
2500, 5000, 5000, 800, 800, 572, 572, 832, 832, 2000, 2000, 510, 
510, 1900, 1900, 640, 640, 800, 800, 2000, 2000, 510, 510, 510, 
510, 510, 510, 510, 510, 715, 715, 700, 700, 500, 500, 1000, 
1000, 1000, 1000, 800, 800, 380, 380, 510, 510, 510, 510, 330, 
330, 330, 330, 330, 330, 1700, 1700, 800, 800, 1500, 1500, 2500, 
2500, 150, 150, 188, 188, 1000, 1000, 400, 400, 1300, 1300, 660, 
660, 2000, 2000, 1000, 1000, 1500, 1500, 950, 950, 1000, 1000, 
500, 500, 1610, 1610, 600, 600, 650, 650, 1670, 1670, 600, 600, 
800, 800, 800, 800, 412, 412, 800, 800, 2000, 2000, 1000, 1000, 
1600, 1600, 2000, 2000, 800, 800, 800, 800, 2500, 2500, 500, 
500, 1200, 1200, 800, 800, 800, 800, 2200, 2200, 2000, 2000, 
600, 600, 500, 500, 3000, 3000, 600, 600, 1000, 1000, 250, 250, 
1000, 1000, 1200, 1200, 1500, 1500, 2500, 2500, 400, 400, 8000, 
8000, 1800, 1800, 1700, 1700, 2000, 2000, 900, 900, 600, 600, 
700, 700, 255, 255, 545, 545, 2800, 2800, 1500, 1500, 600, 600, 
900, 900), g = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("a", 
"b"), class = "factor"), x = c(0, 0, 1, 1, 0, 0, 5, 5, 0, 0, 
0, 0, 0, 0, 2, 2, 0, 0, 4, 4, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 
2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 3, 3, 3, 3, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 5, 5, 
0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2, 2, 0, 0, 0, 
0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 
0, 0, 0, 0, 2, 2, 0, 0, 2, 2, 0, 0, 3, 3, 0, 0, 0, 0, 6, 5, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
1, 1, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 6, 
5, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 2, 2, 0, 0, 3, 3, 4, 4, 
2, 2, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
2, 2, 0, 0, 2, 2, 0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 2, 2, 0, 0, 0, 
0, 1, 1, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 
0, 0, 2, 2, 0, 0, 3, 3, 0, 0, 0, 0, 2, 2, 0, 0, 1, 1, 0, 0, 0, 
0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 2, 2, 0, 0, 
3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 0, 0, 0, 0, 0, 0, 1, 1, 0, 
0, 0, 0, 0, 0, 6, 6, 0, 0, 0, 0, 0, 0, 5, 5, 3, 3, 0, 0, 0, 0, 
1, 1, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 
2, 0, 0, 0, 0, 1, 1, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 
0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 
0, 0, 1, 1, 3, 3, 3, 3, 0, 0, 2, 2, 0, 0, 1, 1, 0, 0, 0, 0, 1, 
1, 0, 0, 1, 1, 2, 2)), row.names = c(NA, -500L), class = c("tbl_df", 
"tbl", "data.frame"))
  • Another link that might be usefull is [this SO post](https://stackoverflow.com/questions/48210231/creating-a-vertical-color-gradient-for-a-geom-bar-plot). – Rui Barradas Nov 02 '20 at 16:23

1 Answers1

2

This is fairly tricky, but possible:

library(ggplot2)
library(ggnewscale)

df2 <- do.call(rbind, lapply(split(df, interaction(df$x, df$g)), function(d) {
  dens <- density(d$y, bw = 1000, from = 0, to = 8000, n = 80)
  data.frame(x = d$x[1], g = d$g[1], y = dens$y)
  }))

df2$x <- df2$x + (as.numeric(as.factor(df2$g)) - 1.5) * 0.4

ggplot(df2[df2$g == "b",], aes(x, y = 100, fill = y, group = g)) + 
  geom_col(orientation = "x", width = 0.2) +
  scale_fill_gradientn(colours = c("white", "dodgerblue"), name = "b") +
  new_scale_fill() +
  geom_col(data = df2[df2$g == "a",], orientation = "x", width = 0.2,
           aes(fill = y)) +
  scale_fill_gradientn(colours = c("white", "orange"), name = "a") +
  geom_smooth(data = df, aes(x = x, y = y, group = g), method = "lm", 
              alpha = 0.3) +
  theme_bw() 

enter image description here

If you want the bars outlined, you would need to add a geom_rect around each one.

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Nice! Is there a way for the densities not be relative to the values of g&x, but to the whole population? For example, the last bars have almost no observations, so they should be lighter. – Ricardo Semião e Castro Nov 02 '20 at 20:37
  • 1
    @RicardoSemiãoeCastro you could change the value of `y` inside the `lapply` call to `y = dens$y * nrow(d)/nrow(df)` – Allan Cameron Nov 02 '20 at 20:46