Displaying Multiple Regression Equations

Question

I have three regressions in one plot that I am trying to display the equation of each for. I've been working off of this question to try and do this. However, the filtering doesn't seem to do anything and it displays the same equation 3 times.

The end goal is to compare cpue in relation to veg, while controlling for location (block), and get the slopes/r^2 values for each of the three regression lines.

Data

cpue<- structure(list(lake = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L), veg = c(254.8026498, 219.9422136, 450.9662078, 484.8605026, 
407.1662151, 286.7015617, 351.6441798, 179.9959443, 340.4276843, 
247.2907435, 502.4119071, 336.4259995, 349.1543197, 281.7493811, 
201.8284859, 325.6380404, 288.3855723, 230.8755861, 214.8890894, 
326.6376698, 214.7468224, 132.0511504, 335.2727641, 336.8727253, 
143.8923225, 277.3053436, 302.7005649, 355.0332852, 307.5736711, 
371.8407176, 168.7645221, 365.9156811, 349.205548, 273.8392697, 
171.4513348, 197.1067049, 350.5833827, 202.9605797, 365.3415045, 
413.2762633, 329.8539209, 377.1415341, 180.8524994, 217.4007852, 
258.5909286, 146.7092479, 258.7440138, 393.2014549, 492.6719497, 
208.5002392, 219.1466664, 182.1366352, 308.0534171, 317.6037795, 
131.7534807, 324.0011761, 469.5861988, 237.4492916, 318.6897863, 
47.94967582, 223.5382632, 386.2227607, 343.7657123, 493.6393726, 
204.2960349, 294.4218332, 178.7555635, 454.0358039, 207.1363947, 
364.6063223, 462.8508521, 292.8613255, 330.3893897, 209.1769838, 
237.4264742, 427.8856667), cpue = c(32.63512612, 47.98168449, 
33.26735173, 14.41435377, 30.94664495, 40.26817963, 41.26204388, 
31.63227286, 36.97932408, 21.54620143, 34.27556883, 6.506644061, 
32.24677471, 38.24536746, 30.95968644, 24.86408391, 31.15438304, 
21.69779047, 39.86223079, 27.92263229, 23.55684281, 34.6157024, 
42.06943746, 24.70597527, 28.36396188, 50.34591832, 55.06361184, 
48.69468021, 26.00084784, 44.77320597, 14.56328001, 33.29291085, 
21.55078237, 29.95980975, 40.61006429, 43.46931237, 26.26407484, 
15.87009067, 39.47297313, 20.50811378, 35.66157343, 35.64563497, 
44.47319537, 42.06574907, 40.16356125, 35.57462201, 32.10051291, 
34.1254268, 34.21084448, 28.18410732, 32.11249307, 38.39890418, 
31.24778375, 29.76951583, 41.52508487, 34.48914051, 28.30923803, 
29.33886042, 37.57268795, 59.29849175, 28.9317113, 41.27342427, 
38.44878019, 44.53768204, 44.48611219, 33.15553274, 34.48894561, 
34.86722967, 31.92515626, 50.04825584, 53.67528105, 37.53150868, 
33.16255301, 33.22374846, 28.28172263, 42.5795616), block = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1", 
"2", "3"), class = "factor")), row.names = c(NA, -76L), class = "data.frame")

Code

# Make lm() with blocking variable----------
lm_eqn2 <- function(df2){
  m2 <- lmer(cpue ~ veg + (1|block), cpue);
  eq2 <- substitute(italic(CPUE) == a + b*","~~italic(r)^2~"="~r2, # Write CPUE = a+b, r^2 = x
                   list(a = format(unname(coef(m2)[1]), digits = 4), # define 'a'
                        b = format(unname(coef(m2)[2]), digits = 2), # define 'b'
                        r2 = format(summary(m)$r.squared, digits = 3))) # define 'r2'
  as.character(as.expression(eq)); # declare expression as a character
}


ggplot(cpue, aes(x=veg, y=cpue, col=block))+
  geom_point()+ 
  geom_smooth(method="lm", show.legend=F, se=F)+
  annotate("text", x=100, y=20, label= lm_eqn2(cpue %>% filter(block==1)), parse=T)+
  annotate("text", x=200, y=30, label= lm_eqn2(cpue %>% filter(block==2)), parse=T)+
  annotate("text", x=300, y=40, label= lm_eqn2(cpue %>% filter(block==3)), parse=T)

When I try to view the equation for each line with the following code:

lm_eqn2(cpue %>% filter(block==2))

it returns the same equation for each blocking number that I filter it by. This makes me think there's something wrong with the code that I made the model and the equation with? The only thing different (that I can tell) from the linked question is that my model has a blocking variable. Not sure if that would actually affect anything though.

Any help would be greatly appreciated.

Try with `m2 <- lmer(cpue ~ veg + (1|block), df2);`. You always use the unfiltered dataset `cpue`. — stefan, Dec 09 '21 at 19:55
I did try this, however when trying to plot the regression it returns: "Error: grouping factors must have > 1 sampled level". Looking around for a solution to this now. Thanks for your input! — hugh_man, Dec 09 '21 at 20:29

Dylan_Gomes · Answer 1 · 2021-12-09T20:05:47.967

You have a few problems here. Firstly, it isn't good practice to use the same name for the dataframe and a vector within. It makes lines like lmer(cpue ~ veg + (1|block), cpue); and ggplot(cpue, aes(x=veg, y=cpue, col=block))+ confusing to many.

But also, using cpue here for the dataframe within your function, means that your function doesn't care what you are passing to it later. Such that m2 <- lmer(cpue ~ veg + (1|block), cpue); is the same every time - hence the same equation is being produced. cpue %>% filter(block==2) is ignored as an argument because df2 doesn't exist within your function. So you need something like this:

lm_eqn2 <- function(df2){
  m2 <- lmer(cpue ~ veg + (1|block), df2); ## note the change to df2 here
  eq2 <- substitute(italic(CPUE) == a + b*","~~italic(r)^2~"="~r2, 
                    list(a = format(unname(coef(m2)[1]), digits = 4), 
                         b = format(unname(coef(m2)[2]), digits = 2), 
                         r2 = format(summary(m2)$r.squared, digits = 3))) 
  as.character(as.expression(eq2)); 
}

** also note that m and eq were not found (in your original code), so I changed them to m2 and eq2 respectively.

This gives the error:

 Error: grouping factors must have > 1 sampled level

which makes sense, because you've fit block as a random intercept in your model code, yet you are filtering your data by the blocking factor. So there is only one "type" of blocking factor in each of the lines cpue %>% filter(block==1), cpue %>% filter(block==2), and cpue %>% filter(block==3). That means there is no information added to your regression when you use (1|block), since block is now a constant.

You might want to explain what you are hoping to do with this blocking factor. Some relevant posts: https://stats.stackexchange.com/q/4700/238878 and https://stats.stackexchange.com/q/31569/238878

My blocking factor just represents different lakes that each survey site is located in. So block 1, 2, and 3 represent lake #1, lake #2, and lake #3. And each row represents a different survey site (block/lake 1 has 25 sites, block 2 has 22 sites, and block 3 has 27 sites. We just want to see the relationship between vegetation and cpue (the variable), while controlling for lake/location. The end goal is to obtain the slope/r^2 for the regression lines for each block/lake — hugh_man, Dec 09 '21 at 20:22
Okay, but the way you've set up your model is that each lake *has* the same slope for vegetation, just different intercepts. What you are doing when you write `lm_eqn2(cpue %>% filter(block==1))` is you are re-running the model with only data from block == 1, which means the block term is meaningless *within* the model. — Dylan_Gomes, Dec 09 '21 at 21:29

Displaying Multiple Regression Equations

1 Answers1