2

So I have a dataset (output below), and my goal is to have boxplot and plots side by side. (see diagram below)

library(tidyverse)
DataSet <- read.csv("filelocation")
ggplot(data = DataSet, 
   aes(x = id,
       y = result)) + 
geom_boxplot(aes(color = live)) +
facet_wrap( ~ resource, scales = "free_y")

For example with this dataset, c3 would have a boxplot for True, but to the right of it, plot points for False.

Diagram of eventual output

dput output:

structure(list(id = c(101L, 101L, 101L, 101L, 102L, 102L, 102L, 
102L, 103L, 103L, 103L, 103L, 103L, 103L, 103L, 104L, 104L, 104L, 
104L, 104L, 105L, 106L, 106L, 106L, 106L, 106L, 107L, 107L, 107L, 
107L, 108L, 108L, 109L, 109L, 109L, 109L, 109L, 109L, 109L, 109L, 
109L, 109L), resource = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L), .Label = c("a", "b"), class = "factor"), result = c(2.12, 
4.72, 4.17, 5.53, 3.6, 3.31, 3.64, 5.33, 4.32, 5.48, 5.93, 3.4, 
3.09, 5.91, 2.93, 1.81, 3.93, 2.22, 4.77, 3.92, 4.08, 3.65, 5.23, 
3.74, 4.03, 3.54, 4.29, 4.3, 2.82, 2.89, 5.41, 4.61, 4, 5.92, 
1.66, 1.65, 1.91, 2.69, 5.28, 2.24, 3.64, 4.77), live = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("f", "t"), class = "factor")), class = "data.frame", row.names = c(NA, 
-42L))

I'd also ideally like to be able to separate the groups with a dividing line, like in the diagram. I've read some R resources but haven't seen any hint that it can be done.

Richard Telford
  • 9,558
  • 6
  • 38
  • 51
eaccas
  • 21
  • 2
  • 2
    Possible duplicate of [How to plot a hybrid boxplot: half boxplot w/ jitter points on the other half](https://stackoverflow.com/questions/49003863/how-to-plot-a-hybrid-boxplot-half-boxplot-w-jitter-points-on-the-other-half) – markus Jun 11 '18 at 21:15
  • You don't have enough points to plot side-by-side boxplots for each `id` in each `resource`. Do you mean to just plot side-by-side boxplots for `resource` only? – acylam Jun 11 '18 at 21:17
  • Edited my post with more relevant data, and my actual original script. You can see that false has far fewer entries, which is why I was hoping for just plot points instead of boxplot. – eaccas Jun 11 '18 at 21:35
  • Regarding the dividing lines: the easiest solution might be `geom_vline(xintercept = c(101.75, 102.75, 103.75))`, which would give you the vertical lines in the plot you provided. *(Someone correct me if I'm wrong, but `facet_wrap()` is only used if you want multiple plots, each **with a separate axis**. In your diagram, all the plots are on the same axis.)* – sam Jun 11 '18 at 23:25

1 Answers1

1

If I understand your question correctly, you want to use a box plot to display TRUE values, and points to display (the few) FALSE values, and you want to divide them up according to the resource.

I am going to use the data which is currently shown in your question, which @Richard Telford has kindly cleaned up.

With facet_wrap()

We are going to use subset() to split your data by the value of live. TRUE rows are plotted using a box plot, and FALSE rows are plotted using points. I used an green and red color for each group respectively, but you may want to change this.

ggplot() + 
  geom_boxplot(data = subset(cleanData, live == 't'),
    aes(x = id, y = result, group = resource), color = 'green') +
  geom_point(data = subset(cleanData, live == 'f'),
    aes(x = id, y = result), color = 'red', size = 3) +
  facet_wrap( ~ resource, scales = 'fixed') +
  scale_x_continuous(breaks = c(101:109), minor_breaks = NULL)

using facet_wrap

Without facet_wrap()

Depending on how you set the scales of the grid, you may end up with a lot of empty space (as we did on the plot above). The code below does not use facet_wrap(), but uses a single plot with a vertical line which approximately divides the a and b values of the resource variable.

ggplot() + 
  geom_boxplot(data = subset(so.data, live == 't'),
    aes(x = id, y = result, group = resource), color = 'green') +
  geom_point(data = subset(so.data, live == 'f'),
    aes(x = id, y = result), color = 'red', size = 3) +
  scale_x_continuous(breaks = c(101:109), minor_breaks = NULL) +
  geom_vline(xintercept = 104.15, linetype = 'dashed')

same plot

Hopefully this puts you on the track to working out exactly what you wanted.

Community
  • 1
  • 1
sam
  • 501
  • 1
  • 4
  • 11