1

my data looks like the following. I need to create some lineplot/barplot for average val for each group like, status and category in the csv file.
Data in dput format.

df <-
structure(list(val = c(4608, 4137, 6507, 5124, 
3608, 34377, 5507, 5624, 4608, 4137, 6507, 5124, 
3608, 3437, 5507, 5507, 5624), status = c("1x", 
"1x", "1x", "2x", "2x", "2x", "2x", "2x", "50xy", 
"50xy", "50xy", "60xy", "60xy", "70xy", "xyz", 
"xyz", "xyz"), category = c("A", "C", "A", "A", 
"A", "B", "B", "C", "B", "C", "A", "B", "C", 
"B", "B", "C", "C")), row.names = c(NA, 
-17L), class = "data.frame")

I tried the following code but could not figure out the whole thing.

library(ggplot2)
ggplot(df, aes(x = status, y = val, group = category, color = source)) + 
      geom_smooth(method = "loess")

Help to plot them (each group wise, such as plotting mean val for each 2x and B) in a single window would be really appreciated. Thank you.

Uwe
  • 41,420
  • 11
  • 90
  • 134
temp
  • 82
  • 1
  • 10

2 Answers2

2

You can do:

library(dplyr)
library(ggplot2)
df %>%
    group_by(category, status) %>%
    mutate(agg = mean(val)) %>%
    ggplot(., aes(status, agg, fill = category, color=status))+
    geom_col(position = "dodge")
Uwe
  • 41,420
  • 11
  • 90
  • 134
YOLO
  • 20,181
  • 5
  • 20
  • 40
  • 1
    The OP wants to plot the means of groups, not just the values. Also just as shorthand, `geom_col()` is equivalent to `geom_bar(stat = "identity")` – camille Dec 20 '18 at 20:11
  • 1
    The bars have uneven widths, the last solution in [this answer](https://stackoverflow.com/questions/11020437/consistent-width-for-geom-bar-in-the-event-of-missing-data/46825844#46825844) solved it for me. – Rui Barradas Dec 20 '18 at 21:35
  • thank you @YOLO, for the hint, it worked, later I converted into line plots. – temp Dec 23 '18 at 01:57
2

This question already has an accepted answer which requires to compute the aggregated mean(val) by status, category group beforehand.

However, ggplot2 includes transformations (or stats) which enable us to create the desired plot in one go without utilizing other packages:

library(ggplot2)
ggplot(df, aes(x = status, y = val, group = category, colour = category)) +
  stat_summary(geom = "line", fun.y = "mean")

This creates a line plot of the mean values as requested by the OP:

enter image description here

Alternatively, we can tell geom_line to use a summary statistics:

ggplot(df, aes(status, val, group = category, colour = category)) +
  geom_line(stat = "summary", fun.y = "mean")

which creates the same plot.

stat_summary() can also be used to show the original data and the summary statistics combined in one plot:

ggplot(df, aes(status, val, group = category, colour = category)) +
  geom_point() +
  stat_summary(geom = "line", fun.y = "mean")

enter image description here

This can help to better understand the structure of the underlying data, e.g., outliers. Please, note the different y scale.

Uwe
  • 41,420
  • 11
  • 90
  • 134