52

This question follows on from an earlier question and its answers.

First some toy data:

df = read.table(text = 
"School      Year    Value 
 A           1998    5
 B           1999    10
 C           2000    15
 A           2000    7
 B           2001    15
 C           2002    20", sep = "", header = TRUE)

The original question asked how to plot Value-Year lines for each School. The answers more or less correspond to p1 and p2 below. But also consider p3.

library(ggplot2)

(p1 <- ggplot(data = df, aes(x = Year, y = Value, colour = School)) +       
   geom_line() + geom_point())

(p2 <- ggplot(data = df, aes(x = factor(Year), y = Value, colour = School)) +       
  geom_line(aes(group = School)) + geom_point())

(p3 <- ggplot(data = df, aes(x = factor(Year), y = Value, colour = School)) +       
  geom_line() + geom_point())

Both p1 and p2 do the job. The difference between p1 and p2 is that p1 treats Year as numeric whereas p2 treats Year as a factor. Also, p2 contains a group aesthetic in geom_line. But when the group aesthetic is dropped as in p3, the lines are not drawn.

The question is: Why is the group aesthetic necessary when the x-axis variable is a factor but the group aesthetic is not needed when the x-axis variable is numeric?

enter image description here

Community
  • 1
  • 1
Sandy Muspratt
  • 31,719
  • 12
  • 116
  • 122

1 Answers1

55

In the words of Hadley himself:

The important thing [for a line graph with a factor on the horizontal axis] is to manually specify the grouping. By default ggplot2 uses the combination of all categorical variables in the plot to group geoms - that doesn't work for this plot because you get an individual line for each point. Manually specify group = 1 indicates you want a single line connecting all the points.

You can actually group the points in very different ways as demonstrated by koshke here

daedalus
  • 10,873
  • 5
  • 50
  • 71
  • So when `x` is numeric, assumptions about grouping the observations by the factor aren't made (and can't be made because obviously `x` in not a factor). I guess what was confusing me in the case of numeric `x` is that it's still possible to get the single line by specifying `group = 1` but multiple lines with no `group` specification. – Sandy Muspratt Apr 27 '12 at 22:39
  • Yes, @Sandy Muspratt, in your latter case, numeric `x` provides a natural ordering but no factor to group by, hence other categorical variables come into play. It took me a while to wrap my head around it, now it just is logical. (Thanks for accepting). – daedalus Apr 27 '12 at 22:46