0

I have a problem with spacing data points in a boxplot. I use the following code.

DF1 <- data.frame(x = c(1, 2, 3, 4, 7, 11, 20, 23, 24, 25, 30), y = c(3, 6, 12, 13, 17, 22, NA, NA, NA, NA, NA))
library(ggplot2)
library(tidyverse)
n <- 11
DF1 <- as.data.frame(DF1)
DF1 <- reshape2::melt(DF1)
DF1 %>%
  group_by(variable) %>%
  arrange(value) %>%
  mutate(xcoord = seq(-0.25, 0.25, length.out = n())) %>%
  ggplot(aes(x = variable, y = value, group = variable)) +
  geom_boxplot() +
  geom_point(aes(x = xcoord + as.integer(variable)))

This results in the following:

R boxplot ggplot2

For x, all data points are evenly distributed left to right, but since y has fewer data points, they are not evenly distributed left to right. How can the above code be modified to evenly space out data points for y too? I would appreciate any suggestions.

I found a somewhat similar post here, but that could not help me.

Thank you.

camille
  • 16,432
  • 18
  • 38
  • 60

1 Answers1

2

The problem is the NA values in y. After you go to long format, you can simply omit them:

plot_data = DF1 %>%
  na.omit %>%  ## add this here
  group_by(variable) %>%
  arrange(value) %>%
  mutate(xcoord = seq(-0.25, 0.25, length.out = n()))

ggplot(plot_data, aes(x = variable, y = value, group = variable)) +
  geom_boxplot() +
  geom_point(aes(x = xcoord + as.integer(variable)))

enter image description here

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Wow, terrific! Thanks for your quick answer @gregor! – Peter Duncan Mar 04 '19 at 22:33
  • @PeterDuncan, if you are satisfied with this answer, please, consider to accept it by clicking on the check mark. This will help other users of SO as well as granting additional reputation points to the answerer. For further guidance, please, see the [SO Help Center](https://stackoverflow.com/help/someone-answers). Thank you. – Uwe May 08 '19 at 07:43