-1

These are from R for data Science book

ggplot(data = demo) +
  geom_bar(mapping = aes(x = cut, y = freq), stat = "identity")

What does stat = "Identity" do?

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))

What does group = 1 do? I didnt find a difference even when I put group = 0, 2, ... etc.

Scransom
  • 3,175
  • 3
  • 31
  • 51
  • 1
    The default behavior of `geom_bar` is to count the number of rows of data (using `stat="count"`) for each x-value and plot that as the bar height. However, if your data are pre-summarized--that is, you already have a columns of counts (like `y=freq` in your example)--then use `stat="identity"`, which tells `geom_bar` to use the `y` aesthetic (`freq` in this case) for the bar heights rather than count rows of data. `group=1` is explained in [this SO answer](https://stackoverflow.com/a/39879232/496488). – eipi10 Jul 11 '17 at 06:03
  • 1
    Please read the `ggplot2`documentation: http://ggplot2.tidyverse.org/index.html, in particular: [Aesthetics: grouping](http://ggplot2.tidyverse.org/reference/aes_group_order.html) and http://ggplot2.tidyverse.org/reference/geom_bar.html – Uwe Jul 11 '17 at 06:36

2 Answers2

2

stat = "identity" tells ggplot that rather than aggregating multiple rows of data and using the number of rows as the height of the bar, instead the height of the bar is already given in a column of data (mapped to y). In the current version of ggplot2, the recommendation is to use geom_col() instead of geom_bar(stat = "identity"). This is explained in the help at ?geom_bar:

If you want the heights of the bars to represent values in the data, use geom_col instead. geom_bar uses stat_count by default: it counts the number of cases at each x position. geom_col uses stat_identity: it leaves the data as is.


As @eipi10 points out, the group bit is a duplicate, it is already well-answered here.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
0

Group does not take in integers or numbers, it basically takes in functions, as, - paste(variable, rep) 

  • interaction(variable, rep)

  • interaction(variable, rep, sep = ' ') 

It basically is used to produce exactly identical levels), as ggplot will coerce to factor, which will have identical levels (or at least levels which differ in label only) in both cases.

For more info, you can check out here

Shujath
  • 846
  • 1
  • 8
  • 22
  • 1
    This isn't fully correct - `group` *does not* take functions, that would imply something like `group = mean` or `group = rnorm`. You *can* give `group` constants, such as integers (see the working example in OP's question with `group = 1`). You can also give `group` a data column, or a function *of one or more data columns*, which is what you show using `interaction()`. It would probably make more sense if you demonstrated with built-in data such as the `diamonds` data in the question. – Gregor Thomas Jul 11 '17 at 06:38