plot group and category means with group_by

Question

I am new to R and trying to figure out a way to plot means for individual samples as well as group means with ggplot. I am following this articles on R-bloggers (last paragraph):

https://www.r-bloggers.com/plotting-individual-observations-and-group-means-with-ggplot2/

This is my code:

gd <- meanplot1 %>%
     group_by(treatment, value) %>%
     summarise(measurement = mean(measurement))

ggplot(meanplot1, aes(x=value, y=measurement, color=treatment)) + 
     geom_line(aes(group=sample), alpha=0.3) + 
     geom_line(data=gd, size=3, alpha=0.9) + 
     theme_bw()

Whilst the sample means are being shown, the group means aren´t. I get the error geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic? Upon adding group=1, I get a weirdly mixed category mean, but not what I am looking for..

I scrolled through a lot of articles already, but couldnt find an answer - I would be so happy if somebody could help me out here!! :)

My data (meanplot1) is formatted like this:

treatment  sample value measurement
1     control, control 1,     initial,             20,
2     control, control 1,          26,             NA,
3     control, control 1,         26',             28,
12    control, control 2,     initial,             22,
13    control control 2,          26,             NA,
14    control control 2,        26',             36,
15    control control 2,          28,             45,
67   stressed,  stress 1,     initial,             37,
68   stressed,  stress 1,          26,             NA,
69   stressed,  stress 1,         26',             17,
78   stressed,  stress 2,     initial,             36,
79   stressed,  stress 2,          26,             NA,
80   stressed,  stress 2,         26',             25,

I am hoping to see 6 lines, one mean for stress 1, stress 2, control 1 and control 2, and one mean for all treatment=control, and one for all treatment=stressed

output dput(gd):

structure(list(treatment = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("control", "stressed"), class = "factor"), value =                 structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L, 2L, 3L, 4L, 5L, 
6L, 7L, 8L, 9L, 10L, 11L), .Label = c("26", "26'", "28", "28'", 
"30", "30'", "32", "32'", "34", "34'", "initial"), class = "factor"), 
measurement = c(NA, 32.3333333333333, 39.5, 30.3333333333333, 
31.8333333333333, 31.8333333333333, NA, 36, 34.6666666666667, 
36, 24.6666666666667, NA, 25.3333333333333, 33.3333333333333, 
32, 50.1666666666667, 39.1666666666667, NA, 33.5, 24.3333333333333, 
27.3333333333333, 36)), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -22L), vars = list(treatment),       drop = TRUE, .Names = c("treatment", 
"value", "measurement"))

output dput(meanplot1):

structure(list(treatment = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label =    c("control", 
"stressed"), class = "factor"), sample = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 
9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L), .Label = c("control 1", 
"control 2", "control 3", "control 4", "control 5", "control 6", 
"stress 1", "stress 2", "stress 3", "stress 4", "stress 5", "stress 6"
), class = "factor"), value = structure(c(11L, 1L, 2L, 
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L, 2L, 3L, 4L, 5L, 6L, 
7L, 8L, 9L, 10L, 11L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
11L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L, 11L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L, 10L, 11L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("26", "26'", 
"28", "28'", "30", "30'", "32", "32'", "34", "34'", "initial"
), class = "factor"), measurement = c(20L, NA, 28L, 18L, 17L, 
19L, 34L, NA, 23L, 29L, 27L, 22L, NA, 36L, 45L, 31L, 40L, 44L, 
NA, 49L, 40L, 39L, 32L, NA, 35L, 57L, 30L, 37L, 29L, NA, 44L, 
37L, 46L, 20L, NA, 39L, 27L, 30L, 40L, 25L, NA, 29L, 50L, 30L, 
26L, NA, 28L, 45L, 47L, 27L, 35L, NA, 24L, 22L, 35L, 28L, NA, 
28L, 45L, 27L, 28L, 24L, NA, 47L, 30L, 39L, 37L, NA, 17L, 29L, 
29L, 31L, 29L, NA, 37L, 21L, 27L, 36L, NA, 25L, 41L, 51L, 66L, 
50L, NA, 33L, 25L, 22L, 36L, NA, 33L, 45L, 26L, 72L, 59L, NA, 
33L, 26L, 25L, 33L, NA, 21L, 33L, 25L, 29L, 21L, NA, 26L, 20L, 
16L, 22L, NA, 30L, 27L, 28L, 57L, 41L, NA, 28L, 23L, 17L, 52L, 
NA, 26L, 25L, 33L, 46L, 35L, NA, 44L, 31L, 57L)), .Names =    c("treatment", 
"sample", "value", "measurement"), class = "data.frame",     row.names = c(NA, 
-132L))

Please provide [reproducible examples](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for your question. The expected output would also help greatly. — Adam Quek, Apr 17 '17 at 07:41
Moreover, `26` sometimes has a apostrophe like `26'`. Is this intended? — KoenV, Apr 17 '17 at 08:17
These are temperature values, they stand for 26°C on the first sampling day, and 26°C on the second sampling day. Initial is the behaviour on the very first experimental day — Pauline, Apr 17 '17 at 08:19
So if your `values` for `meanplot1` is categorical, how did you manage to run `gd <- meanplot1 %>% group_by(treatment, value) %>% summarise(value = mean(value))` in the first place? — Adam Quek, Apr 17 '17 at 08:28
sorry sorry sorry, I changed the names of my variables to have clearer names in this post and messed up - it is summarise(measurement = mean(measurement)) — Pauline, Apr 17 '17 at 08:39
Your sample data are a mess. Please [edit] your Q and add the result of `dput(gd)` - Thank you — Uwe, Apr 17 '17 at 08:48
Hi Uwe, thanks for your reply. I unfortunately dont understand what you mean by "Q" but added the result of dput(gd) — Pauline, Apr 17 '17 at 09:01
@Axeman the data in the beginning is an excerpt from meanplot1, I added the output from dput(meanplot1) now, thanks for taking the time to look at this question! :) — Pauline, Apr 17 '17 at 09:28

score 1 · Accepted Answer · answered Apr 17 '17 at 09:32

1

I suppose you are aiming to plot the treatment means.

By default, since you are using a categorical x-axis, the grouping is set to the interaction between x and color. You only want to group by treatment, however. So we'll add the correct grouping to the call.

ggplot(meanplot1, aes(x = value, y = measurement, color=treatment)) + 
  geom_line(aes(group=sample), alpha=0.3) + 
  geom_line(aes(group = treatment), gd, size=3, alpha=0.9) + 
  theme_bw()

Also note that

ggplot(meanplot1, aes(x=value, y=measurement, color=treatment)) + 
  geom_line(aes(group=sample), alpha=0.3) + 
  stat_summary(aes(group = treatment), fun.y = mean, geom = 'line', size=3, alpha=0.9) +
  theme_bw()

Gives the same plot, without the interruption.

answered Apr 17 '17 at 09:32

Axeman

32,068
8
81
94

wow great, thank you so much that works for me! :) I really appreciate it! Now I am only left with the problem of "initial" jumping to the end of the x axis for some reason.. – Pauline Apr 17 '17 at 09:47
Check the order of your factor levels. See [for example](http://stackoverflow.com/questions/14402242/keep-same-order-as-in-data-files-when-using-ggplot). – Axeman Apr 17 '17 at 09:48
amazing, thank you, meanplot2<-transform(meanplot1, value=factor(value, levels=unique(tvalue))) worked for me. Also thanks for the second edit of the plot, I was working on a solution for the interpolation of missing data when I saw yours! I suppose I cannot use fun.y for the interpolation of the other lines, since it is solely a function in the stat_summary operator? – Pauline Apr 17 '17 at 10:10
There's nothing stopping you from using `stat_summary(aes(group = sample), fun.y = mean, geom = 'line', alpha = 0.3)` though. – Axeman Apr 17 '17 at 10:15

plot group and category means with group_by

1 Answers1