2

Using this data:

library(ggplot2)
dd <- data.frame(id = c("A", "A", "B", "B"), prepost = c("pre", "post"), 
         value = 1:4)

this one works:

qplot(id, value, data = dd, fill = prepost, geom = "bar")

however, the next one gives the indicated error message. The only difference between the two is the addition of group = prepost to the end of the command; however, since we had already written fill = prepost that should be the default group anyways.

> qplot(id, value, data = dd, fill = prepost, geom = "bar", group = prepost)
 Error in pmin(y, 0) : object 'y' not found

We can fix up the last one by adding stat = "identity" like this:

qplot(id, value, data = dd, fill = prepost, geom = "bar", group = prepost, 
      stat = "identity")

I have two questions:

(a) Why does the qplot which gave the error message not work when the others do work?

(b) If we use a continuous y aesthetic with geom_bar then what is supposed to happen if one does not specify stat? From the first qplot it seems that in that case it acts as if stat="identity" but in the presence of group specifying stat="identity" or not reveals a difference.

(By the way, this question seems somewhat related although its different enough that it does not seem to answer this question: Issue with ggplot2, geom_bar, and position="dodge": stacked has correct y values, dodged does not)

Community
  • 1
  • 1
user1189687
  • 147
  • 5

1 Answers1

1

A good question!

I will echo a earlier comment by @joran

I generally find that people are less confused with ggplot once they learn to stop using qplot

Your qplot call can be recreated using

ggplot(dd) + geom_bar(aes(x = id, y = value, fill = prepost))

Now, if you read the help for geom_bar it lists the aesthetics it understands, group is not one of them (so perhaps you can't expect it to work as you wish when you do this)

If you read the help for group you will see that it is an aesthetic that allows you to override the default interaction of all discrete variables.

If you group by prepost alone, you are not grouping by the discrete x axis variable id, which the default would be included as well,

Therefore

ggplot(dd) + 
 geom_bar(aes(x=id, y = value, fill = prepost, group = interaction(id, prepost)))

works, but the grouping is entirely redundant as this is the default.

If you specify just prepost or id as your grouping, it confuses stat_bin (the underlying method that is crunching the numbers to create the values for the bar plot. Hence you need to use stat_identity instead.


EDIT: as noted by the OP in the comments below, this is related to a known issue and will give better warnings in the next version (or th current dev version on github

From the NEWS

  • stat_bin now produces warning messages when it is used with set or mapped y values. Previously, it was possible to use stat_bin and also set/map y values; if there was one y value per group, it would display the y values from the data, instead of the counts of cases for each group. This usage is deprecated and will be removed in a future version of ggplot2. (Winston Chang. Fixes #632)
Community
  • 1
  • 1
mnel
  • 113,303
  • 27
  • 265
  • 254
  • 1
    (a) See ?aes_group_order (b) When you provide melted data to ggplot, it does not know whether you have summarized the data or not. `stat=identity` merely tells ggplot that the data is already summarized. In the help for ?geom_bar this is covered: `Sometimes, bar charts are used not as a distributional summary, but instead of a dotplot. Generally, it's preferable to use a dotplot (see geom_point) as it has a better data-ink ratio. However, if you do want to create this type of plot, you can set y to the value you have calculated, and use stat='identity'` – Brandon Bertelsen Nov 14 '12 at 02:40
  • @mnel, I don't think this really has anything to do with `qplot` vs. `ggplot1 since this fails too: `ggplot(dd, aes(id, value, fill = prepost, group = prepost)) + geom_bar() ` but on your other point you are right - I forgot about `id`. However, despite that it does work when we add `stat = "identity"` so that still leaves question (b). – user1189687 Nov 14 '12 at 02:49
  • @brandon, Of course it works with `stat="identity"`. I already mentioned that in the question. – user1189687 Nov 14 '12 at 02:50
  • 1
    @user1189687, the default for `stat` in `geom_bar` is `bin`, thus `stat_bin`is called (as I state in the answer) – mnel Nov 14 '12 at 02:54
  • @mnel, Yes, `stat_bin` is the default but it does not actually seem to be using that if you specify `y` in these cases so I don't think that that is relevant. – user1189687 Nov 14 '12 at 02:57
  • 1
    I have just noticed that in this link https://github.com/hadley/ggplot2/blob/master/NEWS it mentions that specifying `y` with `stat_bin` is going to issue a warning in the next version of ggplot2 and that in the future it will be an error so I guess the intention is that users should not specify `y` with `stat_bin`. – user1189687 Nov 14 '12 at 02:59
  • @nmel, you did get the point about me missing `id` so if you add the link to the NEWS I just posted to your post I will accept it. – user1189687 Nov 14 '12 at 03:31