0

I am trying to make boxplots with jitter strips using code from a website (https://z3tt.github.io/beyond-bar-and-box-plots/). The code on that website is shown in the screenshot below.

I am not sure what the "g"s stand for and if those are referring to their dataframe. My code so far is this:

df %>%  
  geom_boxplot(aes(data = df, x = groupci, y = df$weight_v1_3, color = groupci, fill = groupci)) +
  scale_y_continuous(breaks = 1:9) +
  scale_color_manual(values = weight_v1_3, guide = "none") +
  scale_fill_manual(values = weight_v1_3, guide = "none")
df + 
  geom_boxplot(alpha = .5, size = 1.5, outlier.size = 5)

df  + 
  geom_boxplot
  (aes(fill = groupci, fill = after_scale(colorspace::lighten(fill, .7))),
    size = 1.5, outlier.shape = NA) +
  geom_jitter(width = .1, size = 7, alpha = .5)

The error messages say: mapping must be created by aes() and Error in is_reference(x, quote(expr = )) : object 'weight_v1_3' not found.

The variable from my data that I want to portray is called "weight_v1_3". I also want to show boxplots side-by-side of a control and intervention group, similar to the example from the website below. Ultimately, I would even like to show control and intervention group side-by-side and then marked for 3 visits, so 3 pairs of control and intervention in one graph for one variable, here weight. I don't think this is included in the code yet. If anybody knows how to do this additionally, that would be awesome.

I just used the same scaling from the website to see what it would look like and then maybe alter it later. Unfortunately my code does not work yet. Does anybody have some ideas? Thank you in advance!

Sampledata

Website example code I am trying to use

Sample data:

structure(list(pseudonym = c(1L, 2L, 4L, 5L, 6L, 7L, 3L, 8L, 
9L, 10L, 11L, 1L, 2L, 4L, 5L, 6L, 7L, 3L, 8L, 9L, 10L, 11L, 1L, 
2L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), control.0.1. = c(0L, 0L, 0L, 
0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 
1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), intervention.0.1. = c(1L, 
1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 
0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), visit = c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), weight.V1.3 = c(60L, 
60L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, 59L, 59L, 59L, 
59L, 59L, 59L, 59L, 59L, 59L, 59L, 59L, 57L, 57L, 57L, 57L, 57L, 
57L, 57L, 57L, 57L)), class = "data.frame", row.names = c(NA, 
-31L))
stefan
  • 90,330
  • 6
  • 25
  • 51
Rnewbie
  • 33
  • 6
  • To help us to help would you mind sharing [a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including a snippet of your data or some fake data. To share your data, you could type `dput(NAME_OF_DATASET)` into the console and copy & paste the output starting with `structure(....` into your post. If your dataset has a lot of observations you could do e.g. `dput(head(NAME_OF_DATASET, 20))` for the first twenty rows of data. – stefan Jan 30 '22 at 12:39
  • Also: Please do not post an image of code/data/errors [for these reasons](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question/285557#285557) – stefan Jan 30 '22 at 12:39
  • Hi Stefan, Thank you for answering. I tried to put dput(name of my dataset), but I unfortunately got a message back, which says: Error: unexpected symbol in "dput(Descriptive statistics_ProLon_V1". – Rnewbie Jan 30 '22 at 12:47
  • Well. As your dataset in the code you posted is named `df` you could try with `dput(df)`. `Descriptive statistics_ProLon_V1` look more like a filename. And if it is actually the name of your dataset as it is stored in memory then you have to wrap it inside backticks "`" as the name contains a space. – stefan Jan 30 '22 at 12:54
  • Thank you, Stefan -this worked, after I also resolved another issue. The sample data is above in my original post. – Rnewbie Jan 30 '22 at 13:16
  • Thanks :) sorry, I forgot to label as code! – Rnewbie Jan 30 '22 at 13:23

1 Answers1

1

Your code has a lot of issues and syntax errors. Just to mention a few:

  1. A ggplot always starts with ggplot(). df %>% geom_boxplot(..) will not work.

  2. There is no data aesthetic. To tell ggplot2 which dataset to use you have to pass it to the data argument, e.g. do ggplot(data = df) or short ggplot(df).

  3. Doing scale_color_manual(values = weight_v1_3) does not work either. You don't have to pass the column name by which you want to color the plot (this is done inside aes()). Inside scale_color_manual you specify which colors to use.

  4. Finally, the g in the code you referenced is a ggplot object, i.e. you could assign a ggplot object to a variable. Then add additional layers to this object.

  5. Before going on I would suggest to work through a ggplot2 tutorial first to get the basics, e.g. the author of the post you referenced has a nice tutorial: https://www.cedricscherer.com/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/

Note: I also did some minor data wrangling steps like renaming your columns and adding a groupci column.

library(ggplot2)
library(dplyr)

df <- df %>% 
  rename_with(~ gsub("\\.", "_", tolower(.x))) %>% 
  mutate(groupci = ifelse(control_0_1_ == 1, "control", "intervention"))

pal <- c("red", "blue")

g <- ggplot(df, aes(x = groupci, y = weight_v1_3)) +
  geom_boxplot(aes(fill = groupci, fill = after_scale(colorspace::lighten(fill, .7))), alpha = .5, size = 1.5, outlier.size = 5)

g + 
  geom_jitter(aes(color = groupci), width = .1, size = 7, alpha = .5) +
  scale_y_continuous(breaks = 1:9) +
  scale_color_manual(values = pal, guide = "none") +
  scale_fill_manual(values = pal, guide = "none")

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Hi Stefan, this is really great! I have a few questions about the code: 4. What did you mean by saying "you could assign a ggplot object to a variable. Then add additional layers to this object."? Specifically, what you mean with "layering". 1. This is probably a very basic thing, but you mentioned that the first function to initiate using ggplot always needs to be "ggplot()". What does "geom_boxplot" then stand for then? I will now look at the tutorial you suggested - Thank you so much for your help! :) – Rnewbie Jan 30 '22 at 14:17