4

I am new to R and have a problem with ggplot and the following dataset (chosen as representative from larger set) where geom_errorbar bars do not align with the mean point (using geom_point) and in several cases the horizontal bars do not align with the vertical bar in geom_errorbar, so that instead of coming out as an "I" with cross bars at top and bottom, the crossbars are separated from the vertical line or off-center.

I have looked at all the man pages for ggplot, geom_point, geom_errorbar, position_jitter (dodge, jitterdodge). I have also tried a bunch of things from here, such as altering the aesthetics within the geom_point and geom_errorbar calls (e.g. How to make dodge in geom_bar agree with dodge in geom_errorbar, geom_point)

Here's a basic data set:

df <- structure(list(
Test = c("A", "B", "C", "D", "A", "C", "D"), 
mean = c(1, 100.793684, 1, 1, 51.615601, 1, 2.456456), 
sd = c(1, 2.045985, 1, 1, 4.790053, 1, 4.250668), 
lower = c(2, 102.839669, 2, 2, 56.405654, 2, 6.707124), 
upper = c(0, 98.747699, 0, 0, 46.825548, 0, -1.79421)), 
row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"))

Now the code I am using:

subplot <- ggplot(df, aes(x = Test, y = mean)) +
  geom_point(aes(x= Test, y = mean), 
             position = position_jitter(width = 0.2, height = 0.2))+
  geom_errorbar(aes(ymin = lower, ymax = upper),
                width = 0.1,
                position = position_jitter(width = 0.2, height = 0.2)) 
subplot

This is what I get:

Output of code above

I suspect it is something basic that I have missed. I have used the same code in line plots and other scatter plots and it has been fine, so I am lost as to what I have done. I have tested it on two different installations of R on separate computers too.

Any help greatly appreciated.

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
bob1
  • 398
  • 3
  • 12

3 Answers3

2

First,

Test = c("A", "B", "C", "D", "A", "C", "D")
mean = c(1, 100.793684, 1, 1, 51.615601, 1, 2.456456)
sd = c(1, 2.045985, 1, 1, 4.790053, 1, 4.250668)
lower = (mean+sd)
upper = (mean-sd)
range = 1:length(Test)

df <- data.frame(Test,mean,sd,lower,upper,range)

then

subplot <- ggplot(df, aes(x = Test, y = mean,group=range)) +
  geom_point(position = position_dodge(width = 0.2))+
  geom_errorbar(aes(ymin = lower, ymax = upper),
                width = 0.1, position = position_dodge(width = 0.2)) 
subplot

image of plot

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Thanks for the help. That also doesn't work with the full dataset, though I'm working on a solution along those lines now. – bob1 Oct 15 '18 at 17:14
2

I posted this data set and problem to the ggplot Github page. It seems that I was indeed missing something simple - I needed to set seed for the geom_ calls to consistently jitter for each point. However it seems that there is an issue with geom_errorbar, as setting seed does not fix the crossbar problem.

Upon further investigation (from the Github team) it seems that the cross-bars are being jittered independently of the line. There is a work around (as of 23/10/18) to fix this. In the mean-time use position_dodge or geom_linerange.

  ggplot(df, aes(x = Test, y = mean)) +
  geom_point(aes(x= Test, y = mean), 
             position = position_jitter(width = 0.2, height = 0.2, seed = 123))+
  geom_linerange(aes(ymin = lower, ymax = upper),
             position = position_jitter(width = 0.2, height = 0.2, seed = 123))

Thanks to all for their help.

bob1
  • 398
  • 3
  • 12
  • `position_dodge` is a better approach for this problem than `position_jitter`, as `position_dodge` allows you to control the spacing between points with a given `Test` value, rather than having it randomly determined by jittering. – eipi10 Oct 15 '18 at 21:48
  • Quite right, `position_dodge` does work well for this approach. However, my full dataset is much larger and `position_dodge` does not allow me to represent each point clearly and has not resolved the issue with the wandering cross-bars. I have tried his approach on my full dataset, now I'll try yours. – bob1 Oct 15 '18 at 21:59
1

It looks like position_jitter is getting applied differently to the different components of the errorbars. That seems like a bug.

Here's a workaround that might accomplish your goals more directly. Add a column (I'm calling it version here) to distinguish between multiple runs of one Test, group by that column, and then use position_dodge to avoid overlaps.

library(dplyr)
df2 <- df %>% 
  group_by(Test) %>% 
  mutate(version = row_number()) %>% 
  ungroup()

subplot <- ggplot(df2, aes(x = Test, y = mean, group = version)) +
  geom_point(position = position_dodge(width = 0.5))+
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2,
            position = position_dodge(width = 0.5)) 
subplot

enter image description here

Alternatively, we could use facet_grid and have the width change depending upon the number of tests, which will make the error bar widths consistent.

subplot <- ggplot(df2, aes(x = version, y = mean)) +
  geom_point(position = position_dodge(width = 0.5))+
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2,
                position = position_dodge(width = 0.5)) +
  scale_x_continuous(breaks = NULL) +
  facet_grid(.~Test, space = "free_x", shrink = T, scales = "free_x")
subplot

enter image description here

Another approach would be to use a discrete scale, as you mention, perhaps by using interaction(Test, version) a variable combining Test and version to give the same width to each run. (I couldn't get the ordering to be by Test when using the interaction approach.)

df2 <- df %>% 
group_by(Test) %>% 
  mutate(version = row_number()) %>% 
  mutate(label = paste(Test, version)) %>%
  ungroup()

subplot <- ggplot(df2, aes(x = label, y = mean)) +
  geom_point(position = position_dodge(width = 0.5))+
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2,
                position = position_dodge(width = 0.5))
subplot

enter image description here

Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • Thanks. Unfortunately that doesn't seem to work on the full dataset, where I have a bunch more variables. I'll keep working on it. Any idea as to why the width of the crossbars varies between the points (c.f. "B" and the ones on "A/1"). I see that on my question and answer below too? – bob1 Oct 14 '18 at 01:46
  • I've added a variation using facets, which allows for error bars to stay same width regardless of number of versions of each test. – Jon Spring Oct 14 '18 at 02:33
  • Thanks for the help. The full version of my data set is already faceted for another feature. I initially thought the problem was something to do with my faceting, but now I wonder if I can convert my scale to continuous from discrete and see if that helps. – bob1 Oct 15 '18 at 17:12
  • If you need to preserve the faceting for something else, then using a discrete scale might be your best option. I've updated my response to show one. – Jon Spring Oct 15 '18 at 17:26