How to run multiple lm models in R and generate a new df?

Question

I have the following df and I need to run for each player the following regression model:

ln(score)_t = \beta_1 + \beta_2\mbox{time_playing}

My code and the example df is something like:

```
library(tidyverse)
library(broom)

df_players <- read.csv("https://github.com/rhozon/datasets/raw/master/data_test_players.csv", head = T, sep = ";") %>% 
  glimpse()

Rows: 105
Columns: 3
$ player       <chr> "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a"…
$ time_playing <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 1,…
$ score        <int> 7, 5, 2, 3, 10, 8, 7, 10, 10, 3, 8, 5, 2, 5, 6, 9, 9, 8, 9, 4, 6, 4, 9, 8, 8, 5, 2, 10, 9, 5, 7, 4, 5, 8, 10, 2, 3, 8, 8, 5, 7, 6, 10…

```

The desired dataframe is something like:

```
df
  player       beta_2
1      a  0.005958000
2      b -0.004110000
3      c  0.000390777
```

How did can I use the lm function for estimate for each different player the beta_2 coefs and generate it like the desired dataframe as showed above ?

Did you use lm at least for a single model? If so paste it. However i'm gonna give you a clue: instead multiple models use different levels: log(score) ~ player : time_playing — Ric, Oct 13 '22 at 00:22
Thanks @RicardoVillalba ! I used your formula, but the estimated coefs (alpha and beta) are been diifferent from the estimated for each player. — Rodrigo H. Ozon, Oct 13 '22 at 00:39
Does this answer your question? [Fitting several regression models with dplyr](https://stackoverflow.com/questions/22713325/fitting-several-regression-models-with-dplyr) — mikebader, Oct 13 '22 at 00:48

score 1 · Answer 1 · answered Oct 13 '22 at 03:37

There might be several ways to do it. This is one of them:

df<-df_players %>% group_by(player) %>% nest() 

my_lm <- function(df) {
    lm(score ~ time_playing, data = df) %>% broom::tidy()
    }

df %>% mutate(coefs = map(data, my_lm)) %>% 
    unnest(coefs) %>% filter(term == "time_playing")

score 0 · Answer 2 · answered Oct 13 '22 at 00:53

Most of what you need is in this solution, but here is an answer tailored to your case:

library(dplyr)

## Create data following your structure
n <- 20  # Number of observations per player
N <- 10  # Number of players

# Simulate data
df <- tibble(
    player = rep(letters[1:10], each = n),
    time_playing = rnorm(n * N),
    e_i = rnorm(n * N),
    beta_2 = rep(runif(N), each = 20),
    score = exp(beta_2 * time_playing + e_i)
)

## Estimate table of betas
betatbl <- df %>%
    group_by(player) %>%
    do(regs = lm(score ~ time_playing, data = .data)) %>%
    mutate(
        beta1 = coef(regs)[1],
        beta2 = coef(regs)[2]
    )

This will work but FYI `dplyr::do()` has been [deprecated](https://dplyr.tidyverse.org/reference/do.html) since v1.0.0, i.e. June 2020, so it may stop working in the future. — SamR, Oct 13 '22 at 16:12

G. Grothendieck · Accepted Answer · 2022-10-14T12:24:43.563

Assuming the input shown in the Note at the end use lmList to run the regressions by player and then extract the coefficients. Omit the last line if it is OK to have player as the row names instead of a column.

library(nlme)
library(tibble)

fo <- log(score) ~ time_playing | player
df_players %>%
  lmList(fo, .) %>% 
  coef %>%
  rownames_to_column(var = "player")

giving:

  player (Intercept)  time_playing
1      a    1.678156  0.0059581851
2      b    1.732095 -0.0041131361
3      c    1.642926  0.0003907772

This code can be used for plotting the three regression curves and data.

library(lattice)
xyplot(fo, df_players, type = c("p", "r"), as.table = TRUE)

Note

u <- "https://github.com/rhozon/datasets/raw/master/data_test_players.csv"
df_players <- read.csv2(u)

How to run multiple lm models in R and generate a new df?

3 Answers3

Note