0

According to this ggplot2 tutorial, the following code produces a multi-colored scatterplot:

library(ggplot2)
gg <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point(aes(col=state), size=3) +  # Set color to vary based on state categories.
  geom_smooth(method="lm", col="firebrick", size=2) + 
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")
plot(gg)

enter image description here

How can I make multiple regression lines (i.e. one for each state)?

wwl
  • 2,025
  • 2
  • 30
  • 51
  • 1
    @Hack-R I don't believe this is an exact dupe, the answer below uses just one call to `geom_smooth` to draw all regression lines. – Rui Barradas Jun 09 '18 at 18:35
  • 3
    @RuiBarradas It seems the *question* is a dupe, but I agree the *answer* may not be. It raises an interesting point. Right now we are supposed to close duplicate *questions*. However, maybe it should be dupe question-answer pairs that we close? You could ask on Meta. – Hack-R Jun 09 '18 at 18:43
  • 1
    @Hack-R [Done](https://meta.stackoverflow.com/questions/369259/should-we-vote-to-close-dupe-questions-or-dupe-pairs-question-answer). – Rui Barradas Jun 09 '18 at 19:17
  • 1
    @wwl I can recommend reading `?geom_smooth`. In the Examples you find "_Smoothes are automatically fit to each group (defined by categorical aesthetics or the group aesthetic)_"; `ggplot(mpg, aes(displ, hwy, colour = class)) + geom_smooth(se = FALSE, method = lm)` – Henrik Jun 09 '18 at 20:45

1 Answers1

2

Actually, you have moved the col=state attributes to aes of geom_point that's why its(grouping) not available to geom_smooth. One option is to move col=state in aes of ggplot itself. The modified code will be as:

library(ggplot2)

gg <- ggplot(midwest, aes(x=area, y=poptotal, col=state)) + 
  geom_point(size=3) +  # Set color to vary based on state categories.
  geom_smooth(method="lm", size=1, se=FALSE) + 
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population",
  x="Area", caption="Midwest Demographics")
plot(gg)

enter image description here

MKR
  • 19,739
  • 4
  • 23
  • 33