0

I am trying to add a linear regression model to my plot. I have this data frame:

           watershed        sqm        cfs
3 deerfieldwatershed 1718617392 22703.8851
5     greenwatershed  233458430  1637.4895
6     northwatershed  240348182  3281.9921
8     southwatershed   68031782   867.6428

and my current code is:

ggplot(dischargevsarea, aes(x = sqm, y = cfs, color = watershed)) + 
  geom_point(aes(color = watershed), size = 2) + 
  labs(y= "Discharge (cfs)", x = "Area (sq. m)", color = "Watershed") + 
  scale_color_manual(values = c("#BAC4C1", "#37B795", 
                                "#00898F", "#002245"),
                     labels = c("Deerfield", "Green", "North",
                                "South")) + 
  theme_minimal() + 
  geom_smooth(method = "lm", se = FALSE)

Which, when it runs, adds a line to the points in the legend, but does not show up on the graph (see image below). I suspect it is drawing a line individually for each point, but I want one regression line for all four points. How would I get the line I want to show up? Thanks. Graph of discharge vs. area of 4 watersheds

Jamie
  • 15
  • 2

1 Answers1

0

You're right, it is because your points are grouped in different categories (because of the color in your first aes), so when you call geom_smooth, it will make a regression line for each categories and in your example, it means for each single point. So, that's why you don't have a single regression line.

To get a regression line for all points, you can pass the color argument only in the aes of geom_point (or you can use inherit.aes = FALSE in geom_smooth to indicate to ggplot to not consider previous mapping arguments and fill it with new arguments).

To display the equation on the graph (based on your question in comments), you can have the use of the stat_poly_eq function from the ggpmisc package (here a SO post describing its use: Add regression line equation and R^2 on graph):

library(ggplot2)
library(ggpmisc)
ggplot(df, aes(x = sqm, y = cfs)) +
  labs(y= "Discharge (cfs)", x = "Area (sq. m)", color = "Watershed") + 
  scale_color_manual(values = c("#BAC4C1", "#37B795", 
                                "#00898F", "#002245"),
                     labels = c("Deerfield", "Green", "North",
                                "South")) + 
  theme_minimal() + 
  geom_smooth(method = "lm", se = FALSE, formula = y~x)+
  stat_poly_eq(formula = y~x, aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), 
               parse = TRUE)+
  geom_point(aes(color = watershed))

enter image description here

Data

structure(list(watershed = c("deerfieldwatershed", "greenwatershed", 
"northwatershed", "southwatershed"), sqm = c(1718617392L, 233458430L, 
240348182L, 68031782L), cfs = c(22703.8851, 1637.4895, 3281.9921, 
867.6428)), row.names = c(NA, -4L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x55ef09764350>)
dc37
  • 15,840
  • 4
  • 15
  • 32
  • Perfect, thank you so much! Now if I wanted to add the equation of this line to the graph, how would I do that? – Jamie Feb 11 '20 at 17:50