-2

How does the formula in the first parameter of boxplot in the following code make the correct correspondence between b and a after a has been reordered.

a <- as.factor(c("TX", "NY", "WA"))
levels(a)
b <- c(5, 3, 2)
boxplot(b ~ a)
# Order the levels of a according to their value in b
a_reordered <- reorder(a, b)
levels(a_reordered)
boxplot(b ~ a_reordered)

Why doesn't b need to be reordered as well?

edit: I replaced my example with the concrete example of @Marius

user492922
  • 925
  • 2
  • 12
  • 23

1 Answers1

4

In your boxplot(quantity ~ bymedian) call, the order of states on the x-axis is determined by the order of levels for the bymedian factor. Compare levels(x$State) to levels(bymedian), and you'll see why the two variables behave differently when used in a plot. Note that the data itself in bymedian hasn't changed order.

A quick example:

a <- as.factor(c("TX", "NY", "WA"))
levels(a)
b <- c(5, 3, 2)
boxplot(b ~ a)
# Order the levels of a according to their value in b
a_reordered <- reorder(a, b)
levels(a_reordered)
boxplot(b ~ a_reordered)

And just to make it clear what it means to say that the actual data hasn't changed:

> a
[1] TX NY WA
Levels: NY TX WA
> a_reordered
[1] TX NY WA
# Don't be confused by this extra attr(, "scores") bit: the line
# above is the actual data stored in the vector
#attr(,"scores")
#NY TX WA 
# 3  5  2 
Levels: WA NY TX
> b
[1] 5 3 2
Marius
  • 58,213
  • 16
  • 107
  • 105
  • The question is why doesn't b need to be reordered? I know that a_reordered has a "scores" attribute. Does the formula or boxplot use that attribute somehow? – user492922 Jan 21 '13 at 00:58
  • `b` doesn't need to be reordered because the data in `a_reordered` is still in the same order, so `"TX"` is still in the same position in `a_reordered` as `5` is in `b`. Therefore, 5 will still be used as the y-value wherever "TX" is on the x-axis. – Marius Jan 21 '13 at 01:01