How to use results from different regression models in a scatterplot built using group_by in R?

Question

I would like to add 2 different regression curves, coming from different models, in a scatter plot. Let's use the example below:

Weight=c(12.6,12.6,16.01,17.3,17.7,10.7,17,10.9,15,14,13.8,14.5,17.3,10.3,12.8,14.5,13.5,14.5,17,14.3,14.8,17.5,2.9,21.4,15.8,40.2,27.3,18.3,10.7,0.7,42.5,1.55,46.7,45.3,15.4,25.6,18.6,11.7,28,35,17,21,41,42,18,33,35,19,30,42,23,44,22)
Increment=c(0.55,0.53,16.53,55.47,80,0.08,41,0.1,6.7,2.2,1.73,3.53,64,0.05,0.71,3.88,1.37,3.8,40,3,26.3,29.7,10.7,35,27.5,60,43,31,21,7.85,63,9.01,67.8,65.8,27,40.1,31.2,22.3,35,21,74,75,12,19,4,20,65,46,9,68,74,57,57)
Id=c(rep("Aa",20),rep("Ga",18),rep("Za",15))
df=data.frame(Id,Weight,Increment)

The scatter plot looks like this:

plot_df <- ggplot(df, aes(x = Weight, y = Increment, color=Id)) + geom_point()

I tested a linear and an exponential regression model and could extract the results following loki's answer there:

linear_df <- df %>% group_by(Id) %>% do(model = glance(lm(Increment ~ Weight,data = .))) %>% unnest(model)
exp_df <- df %>% group_by(Id) %>% do(model = glance(lm(log(Increment) ~ Weight,data = .))) %>% unnest(model)

The linear model fits better for the Ga group, the exponential one for the Aa group, and nothing for the Za one:

> linear_df
# A tibble: 3 x 13
  Id    r.squared adj.r.squared  sigma  statistic  p.value    df logLik    AIC    BIC deviance df.residual  nobs
  <chr>     <dbl>         <dbl>  <dbl>      <dbl>    <dbl> <dbl>  <dbl>  <dbl>  <dbl>    <dbl>       <int> <int>
1 Aa       0.656         0.637  15.1       34.4   1.50e- 5     1 -81.6  169.   172.   4106.             18    20
2 Ga       1.00          1.00    0.243 104113.    6.10e-32     1   1.01   3.98   6.65    0.942          16    18
3 Za       0.0471       -0.0262 26.7        0.642 4.37e- 1     1 -69.5  145.   147.   9283.             13    15

> exp_df
# A tibble: 3 x 13
  Id    r.squared adj.r.squared  sigma  statistic  p.value    df logLik     AIC    BIC deviance df.residual  nobs
  <chr>     <dbl>         <dbl>  <dbl>      <dbl>    <dbl> <dbl>  <dbl>   <dbl>  <dbl>    <dbl>       <int> <int>
1 Aa      0.999          0.999  0.0624 24757.     1.05e-29     1  28.2  -50.3   -47.4    0.0700          18    20
2 Ga      0.892          0.885  0.219    132.     3.86e- 9     1   2.87   0.264   2.94   0.766           16    18
3 Za      0.00444       -0.0721 0.941      0.0580 8.14e- 1     1 -19.3   44.6    46.7   11.5             13    15

Now, how can I draw the linear regression line for the Aa group, the exponential regression curve for the Ga group, and no curve for the Za group? There is this, but it applies for different regressions built inside the same model type. How can I combine my different objects?

G. Grothendieck · Accepted Answer · 2021-10-07T13:47:44.387

The formula shown below gives the same fitted values as does 3 separate fits for each Id so create the lm objects for each of the two models and then plot the points and the lines for each. The straight solid lines are the linear model and the curved dashed lines are the exponential model.

library(ggplot2)

fm.lin <- lm(Increment ~ Id/Weight + 0, df)
fm.exp <- lm(log(Increment) ~ Id/Weight + 0, df)

df %>%
  ggplot(aes(Weight, Increment, color=Id)) +
  geom_point() +
  geom_line(aes(y = fitted(fm.lin))) +
  geom_line(aes(y = exp(fitted(fm.exp))), lty = 2, lwd = 1)

To only show the Aa fitted lines for the linear model and Ga fitted lines for the exponential model NA out the portions not wanted. In this case we used solid lines for the fitted models.

df %>%
  ggplot(aes(Weight, Increment, color=Id)) +
  geom_point() +
  geom_line(aes(y = ifelse(Id == "Aa", fitted(fm.lin), NA))) +
  geom_line(aes(y = ifelse(Id == "Ga", exp(fitted(fm.exp)), NA)))

Added

Regarding the questions in the comments, the formula used above nests Weight within Id and effectively uses a model matrix which, modulo column order, is a block diagonal matrix whose blocks are the model matrices of the 3 individual models. Look at this to understand it.

model.matrix(fm.lin)

Since this is a single model rather than three models the summary statistics will be pooled. To get separate summary statistics use lmList from the nlme package (which comes with R so it does not have to be installed -- just issue a library statement). The statements below will give objects of class lmList that can be used in place of the ones above as they have a fitted method that will return the same fitted values.

library(nlme)
fm.lin2 <- lmList(Increment ~ Weight | Id, df, pool = FALSE)
fm.exp2 <- lmList(log(Increment) ~ Weight | Id, df, pool = FALSE)

In addition, they can be used to get individual summary statistics. Internally the lmList objects consist of a list of 3 lm objects with attributes in this case so we can extract the summary statistics by extracting the summary statistics from each component.

library(broom)
sapply(fm.lin2, glance)  
sapply(fm.exp2, glance)

One caveat is that common statistical tests between models using different dependent variables, Increment vs. log(Increment), are invalid.

Thanks, that works great! I had to use `na.action=na.exclude` inside the lm function to adapt it to my real dataset to make it work. May I just ask two clarifications? First, I don't really get the formula written like this: `Increment ~ Id/Weight + 0`, how does that read? Second, it looks like the R² and p-value results are different. When I look at `linear_df` I have r² and pvalue for each Id. When I call `summary(fm.lin)`, the coefficients are indeed the same but I have one (general I guess) r² and p-value. How can I obtain results for Aa, Ga, Za? Or should I keep using linear_df for that? — Mata, Oct 07 '21 at 09:01
Thank you again, this really helps. I might be missing something, for the formula, why is there a +0 in fm.lin and fm.exp? Just to fully understand why there, and why not in fm.lin2 and fm.exp2. Also, there was just a log missing there: `fm.exp2 <- lmList(log(Increment) ~ Weight | Id, df, pool = FALSE)` And a final question: is there a way to increase to resolution to draw the exponential curve? I'll have to publish it, and here we see the line segments. Thank you again for your help! — Mata, Oct 07 '21 at 13:34
The 0 removes the overall intercept so that it provides separate intercepts for each group. For fitted values it does not change anything but for coefficients it makes it give the same coefs as 3 separate models. Look at the model.matrix for both situations and compare. In lmList case it is fitting separate models so you don't need 0. Yes, the log was missing. Have fixed. To get good resolution use vector graphics rather than pixel graphics. emf, wmf or svg. — G. Grothendieck, Oct 07 '21 at 13:48
Thank you very much for your elegant solution and associated explanations! — Mata, Oct 07 '21 at 13:57

score 1 · Answer 2 · answered Oct 06 '21 at 19:03

possible solution

Weight=c(12.6,12.6,16.01,17.3,17.7,10.7,17,10.9,15,14,13.8,14.5,17.3,10.3,12.8,14.5,13.5,14.5,17,14.3,14.8,17.5,2.9,21.4,15.8,40.2,27.3,18.3,10.7,0.7,42.5,1.55,46.7,45.3,15.4,25.6,18.6,11.7,28,35,17,21,41,42,18,33,35,19,30,42,23,44,22)
Increment=c(0.55,0.53,16.53,55.47,80,0.08,41,0.1,6.7,2.2,1.73,3.53,64,0.05,0.71,3.88,1.37,3.8,40,3,26.3,29.7,10.7,35,27.5,60,43,31,21,7.85,63,9.01,67.8,65.8,27,40.1,31.2,22.3,35,21,74,75,12,19,4,20,65,46,9,68,74,57,57)
Id=c(rep("Aa",20),rep("Ga",18),rep("Za",15))
df=data.frame(Id,Weight,Increment)

library(tidyverse)
df_model <- df %>%
  group_nest(Id) %>%
  mutate(
    formula = c(
      "lm(log(Increment) ~ Weight, data = .x)",
      "lm(Increment ~ Weight,data = .x)",
      "lm(Increment ~ 0,data = .x)"
    ),
    transform = c("exp(fitted(.x))",
                  "fitted(.x)",
                  "fitted(.x)")
  ) %>%
  mutate(model = map2(data, formula, .f = ~ eval(parse(text = .y)))) %>%
  mutate(fit = map2(model, transform, ~ eval(parse(text = .y)))) %>%
  select(Id, data, fit) %>%
  unnest(c(data, fit))


ggplot(df_model) +
  geom_point(aes(Weight, Increment, color = Id)) +
  geom_line(aes(Weight, fit, color = Id))

^{Created on 2021-10-06 by the reprex package (v2.0.1)}

Thanks @Yuriy, that helps. It works fine on my reproducible example, but I cannot make it work on my real data. I carefully checked the writing, but I get an error "Error: Problem with mutate() column model. i model = map2(data, formula, .f = ~eval(parse(text = .y))). x :2:0: unexpected end of input 1: lm(log(mortal) ~ growth, data=.x" I tried to use the na.exclude in the lm function, but that doesn't solve it. Any idea what I might be missing? Also, how can I not draw the line for Id=Za in your code? I guess I can NA it out as done above, but I cannot test it. — Mata, Oct 07 '21 at 09:08
@G.Grothendieck's solution is just great. I think it's worth using his version for work — Yuriy Saraykin, Oct 07 '21 at 12:12

How to use results from different regression models in a scatterplot built using group_by in R?

2 Answers2

Added