1

I am drawing a boxplot along with violin plot to see the distribution of data using ggplot2. The quartiles of the box plot are very close to each other. That's why it causes overlapping.

I used ggrepel::geom_label_repel but, it did not work. If I remove geom_label_repel, some labels overlap.

Here is my R code and a sample data:

dataset <- data.frame(Age = sample(1:20, 100, replace = T))

ggplot(dataset, aes(x = "", y = Age)) +
    geom_violin(position = "dodge", width = 1, fill = "blue") +
    geom_boxplot(width=0.1, position = "dodge", fill = "red") +
    stat_boxplot(geom = "errorbar", width = 0.1) +
    stat_summary(geom = "label", fun.y = quantile, aes(label = ..y..),
                 position = position_nudge(x = -0.05), size = 3) +
    ggrepel::geom_label_repel(aes(label = quantile)) +
    ggtitle("") +
    xlab("") +
    ylab(Age)

In addition to this, does anyone familiar with the combination of boxplot and violin plot? The left side of the plot is box-plot and the right side is the violin plot (I am not asking side by side plots. Just one plot).

tjebo
  • 21,977
  • 7
  • 58
  • 94
Mehmet Yildirim
  • 471
  • 1
  • 4
  • 17

3 Answers3

1

Here a slightly different approach, without ggrepel. Half a violin plot is actually a classic density plot, just vertical. That's the basis for the plot. I am adding a horizontal box plot with ggstance::geom_boxploth. For the labels, we cannot use stat_summary any more, because we cannot summarise over x values (maybe someone knows how to do that, I don't). So I used this fantastically obscure code by @eipi10 to pre-calculate the quantiles in one go. You can set the position of the boxplot to 0, and just fill the density plot, in order to avoid some real hack with calculating your segments etc.

You can then pretty neatly fine tune your graphs to your liking.

library(tidyverse)
library(ggstance)
#> 
#> Attaching package: 'ggstance'
#> The following objects are masked from 'package:ggplot2':
#> 
#>     geom_errorbarh, GeomErrorbarh

dataset <- data.frame(Age = sample(1:20, 100, replace = T))

my_quant <- dataset %>% 
  summarise(Age = list(enframe(quantile(Age, probs=c(0.25,0.5,0.75))))) %>% 
  unnest

my_y <- 0

ggplot(dataset) +
  ggstance::geom_boxploth(aes(x = Age, y = my_y), width = .05) +
  geom_density(aes(x = Age)) +
  annotate(geom = "label", x = my_quant$value, my_y, label = my_quant$value) +
  coord_flip()

Now adding a fill.


ggplot(dataset) +
  ggstance::geom_boxploth(aes(x = Age, y = my_y), width = .05) +
  geom_density(aes(x = Age), fill = 'white') +
  annotate(geom = "label", x = my_quant$value, my_y, label = my_quant$value) +
  coord_flip()

Created on 2019-07-29 by the reprex package (v0.2.1)

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • Thank you for your answer. It works perfectly. I tried to fix the width of the density plot (It may not be logical but, I just tried) to get a consistently same size box plot and density plot; however, `width` option did not work with `geom_density()`. Since it did not work, I removed the `width=0.05` from `ggstance::geom_boxploth()`. It did not work either since some variables concentrate around some values and the box-plot cannot be seen for some variables. Do you know how to arrange the widths of both plots in a nice ratio so I can get a nice plot? – Mehmet Yildirim Jul 31 '19 at 15:42
  • Glad I could help. I am not sure that I fully understand what you mean with "consistently same size b plot and d plot"?... – tjebo Jul 31 '19 at 16:02
  • Since this sample is normally distributed, it looks normal. However, in real data, some values are dominating the variable, let's say 90% concentrated around 0 and the rest of them distributed evenly around other values, the boundary of density plot exceeds way much the width of the box plot. I want to see them in a good ratio. I am drawing this plot for around 40 variables and putting into `for` loop. I cannot control the limits for each variable. – Mehmet Yildirim Jul 31 '19 at 21:57
  • Maybe simply try `geom_density(aes(x = ..., y = ..scaled..))` and then the box plot width to something like 0.5 – tjebo Jul 31 '19 at 23:25
0

When using the standard R boxplot command, use the command text to include the 5 statistical parameters into the graph.
Example:

#
boxplot(arq1$J00_J99,arq1$V01_Y89,horizontal = TRUE)
text(x = boxplot.stats(arq1$J00_J99)$stats, labels = 
boxplot.stats(arq1$J00_J99)$stats, y = 0.5)
text(x = boxplot.stats(arq1$V01_Y89)$stats, labels = 
boxplot.stats(arq1$V01_Y89)$stats, y = 2.5)

look to the text overlapping on the upper boxplot

This shows one overlap of the labels into the upper boxplot To avoid this, execute text twice, selecting distinct statistical parameters into distinct y heights:

 text(x = boxplot.stats(arq1$V01_Y89)$stats[2:5], labels = 
  boxplot.stats(arq1$V01_Y89)$stats[2:5], y = 2.5)
 text(x = boxplot.stats(arq1$V01_Y89)$stats[1], labels = 
  boxplot.stats(arq1$V01_Y89)$stats[1], y = 2.)


#   

upper boxplot with text overlapping solved

Above I have asked to include the parameters from 2 to 5: 1st quartile, median, 3rd quartile and maximum value at y=2.5 and the minimum value at y=2.
This solves any kind of statistical parameters overlapping into boxplots

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
C Lederman
  • 19
  • 4
-1

When using the standard R boxplot command, use the command text to include the 5 statistical parameters into the graph, for example:

boxplot(arq1$J00_J99,arq1$V01_Y89,horizontal = TRUE)
text(x = boxplot.stats(arq1$J00_J99)$stats, labels = boxplot.stats(arq1$J00_J99)$stats, y = 0.5)
text(x = boxplot.stats(arq1$V01_Y89)$stats, labels = boxplot.stats(arq1$V01_Y89)$stats, y = 2.5)

This shows one overlap of the labels into the upper boxplot.

To avoid this, execute text twice, selecting distinct statistical parameters into distinct y heights:

text(x = boxplot.stats(arq1$V01_Y89)$stats[2:5], labels = boxplot.stats(arq1$V01_Y89)$stats[2:5], y = 2.5)
text(x = boxplot.stats(arq1$V01_Y89)$stats[1], labels = boxplot.stats(arq1$V01_Y89)$stats[1], y = 2.)

above I have asked to include the parameters from 2 to 5: 1st quartile, median, 3rd quartile and maximum value at y=2.5 and the minimum value at y=2

This solves any kind of statistical parameters overlapping into boxplots

Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
C Lederman
  • 19
  • 4