3

I created a boxplot showing the dispersal distance $dist of some species $spe, and I would like the width of the boxes to be proportional to the density of regeneration of these species. I used "varwidth" and weight aesthetic as shown below, but this is still not correct, as it is still proportional to the number of observations and not only to the density of regeneration...

(for the density, I calculated the proportion for each species, so it goes from 10 to 100. It is given in the column data_dist2$prop2)

p <- ggplot(data_dist2, aes(x = reorder(spe, prop2), y = dist)) + 
  coord_flip() + 
  geom_boxplot(varwidth = TRUE, alpha=0.3, aes(weight=data_dist2$prop2), fill='grey10')

Would you have any idea how to make the boxplot exactly proportional to my prop2 column?

Reproductive example :

structure(list(spe = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L), .Label = c("Abies concolor", "Picea abies", "Sequoia semp."
), class = "factor"), dist = c(0, 0, 3, 3, 4, 4, 25, 46, 59, 
113, 113, 9, 12, 12, 12, 15, 22, 22, 22, 22, 35, 35, 36, 49, 
85, 85, 90, 5, 5, 1, 1, 8, 13, 48, 48, 52, 52, 52, 65, 89), prop2 = c(92.17, 
92.17, 92.17, 92.17, 92.17, 92.17, 92.17, 92.17, 92.17, 92.17, 
92.17, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 
10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 100, 100, 100, 100, 
100, 100, 100, 100, 100, 100, 100, 100, 100)), row.names = c(NA, 
-40L), class = "data.frame")

Aurore F
  • 31
  • 3
  • 1
    Can you provide a reproducible example of your dataset ? (please read this tutorial: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – dc37 Mar 04 '20 at 14:43
  • If `prop2` ranges from 10 to 100 you can set `aes(width = I(prop2 / 100))`. This will set the width to be from 0.1 to 1. – Drumy Mar 04 '20 at 15:00
  • Hello @Drumy, thank you for the answer. Where do you put the width aesthetic ? I get this error message : Warning: Ignoring unknown aesthetics: width – Aurore F Mar 04 '20 at 15:07
  • Oops, I thought `geom_boxplot` accepts `width`as an aesthetics. My bad. The solution is a little bit more complicated than that then. Please see my answe below. – Drumy Mar 04 '20 at 15:52

2 Answers2

0

Weight doesn't seem to be designed exactly for this, but you can hack it a bit. First note that the weight given to each group is the sum of the weights of the observations, so if you have a different number of observation for each species then you may need to change prop2 to the current value divided by the number of observations in the group. (I can't tell from your example if this applies)

Then note that the width is proportional to the square root of the weight, so change your code to reverse that with:

p <- ggplot(data_dist2, aes(x = reorder(spe, prop2), y = dist)) + 
     coord_flip() + 
     geom_boxplot(varwidth = TRUE, alpha=0.3, aes(weight=data_dist2$prop2^2), fill='grey10')
Miff
  • 7,486
  • 20
  • 20
0

Miff beats me to it, but anyway here's my answer. As Miff said, you can weight the width by your prop2.

ggplot(data_dist2, aes(x = reorder(spe, prop2), y = dist)) + 
 geom_boxplot(aes(weight = prop2), 
              varwidth = TRUE,
              fill='grey10', alpha=0.3) +
 coord_flip()

enter image description here

But geom_boxplot() implicitly takes the sample size into account. So you need to divide that away in your weights. Here's how you can do it with data.table.

library(data.table)
setDT(data_dist2) # convert to data.table
data_dist2[, weight := prop2 / .N, by = spe] # Divide prop2 by sample size for each species

ggplot(data_dist2, aes(x = reorder(spe, prop2), y = dist)) + 
  geom_boxplot(aes(weight = weight),  # note weight = weight, not weight = prop2
               varwidth = TRUE,
               fill='grey10', alpha=0.3) +
  coord_flip()

enter image description here

Drumy
  • 450
  • 2
  • 16
  • Thank you, I tried this and the other way given by @Miff below. It seemed clear but I still don't get the good results when I apply it to all the species. I also tried to create a new column with the square root of the number of observation (w), so I do weight= prop2/w to "cancel" the weight given by the number of observations. But it still doesn't look the way it should... I have to go to a meeting but I'll try again right after and let you know ! – Aurore F Mar 04 '20 at 16:22