0

I have the following df and I need to run for each player the following regression model:

ln(score)_t = \beta_1 + \beta_2\mbox{time_playing}

My code and the example df is something like:

```
library(tidyverse)
library(broom)

df_players <- read.csv("https://github.com/rhozon/datasets/raw/master/data_test_players.csv", head = T, sep = ";") %>% 
  glimpse()

Rows: 105
Columns: 3
$ player       <chr> "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a"…
$ time_playing <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 1,…
$ score        <int> 7, 5, 2, 3, 10, 8, 7, 10, 10, 3, 8, 5, 2, 5, 6, 9, 9, 8, 9, 4, 6, 4, 9, 8, 8, 5, 2, 10, 9, 5, 7, 4, 5, 8, 10, 2, 3, 8, 8, 5, 7, 6, 10…

```

The desired dataframe is something like:

```
df
  player       beta_2
1      a  0.005958000
2      b -0.004110000
3      c  0.000390777
```

How did can I use the lm function for estimate for each different player the beta_2 coefs and generate it like the desired dataframe as showed above ?

user438383
  • 5,716
  • 8
  • 28
  • 43
  • Did you use lm at least for a single model? If so paste it. However i'm gonna give you a clue: instead multiple models use different levels: log(score) ~ player : time_playing – Ric Oct 13 '22 at 00:22
  • Correction: log(score) ~ player * time_playing – Ric Oct 13 '22 at 00:27
  • Thanks @RicardoVillalba ! I used your formula, but the estimated coefs (alpha and beta) are been diifferent from the estimated for each player. – Rodrigo H. Ozon Oct 13 '22 at 00:39
  • Does this answer your question? [Fitting several regression models with dplyr](https://stackoverflow.com/questions/22713325/fitting-several-regression-models-with-dplyr) – mikebader Oct 13 '22 at 00:48

3 Answers3

1

There might be several ways to do it. This is one of them:

df<-df_players %>% group_by(player) %>% nest() 

my_lm <- function(df) {
    lm(score ~ time_playing, data = df) %>% broom::tidy()
    }

df %>% mutate(coefs = map(data, my_lm)) %>% 
    unnest(coefs) %>% filter(term == "time_playing")
Zhiqiang Wang
  • 6,206
  • 2
  • 13
  • 27
0

Most of what you need is in this solution, but here is an answer tailored to your case:

library(dplyr)

## Create data following your structure
n <- 20  # Number of observations per player
N <- 10  # Number of players

# Simulate data
df <- tibble(
    player = rep(letters[1:10], each = n),
    time_playing = rnorm(n * N),
    e_i = rnorm(n * N),
    beta_2 = rep(runif(N), each = 20),
    score = exp(beta_2 * time_playing + e_i)
)

## Estimate table of betas
betatbl <- df %>%
    group_by(player) %>%
    do(regs = lm(score ~ time_playing, data = .data)) %>%
    mutate(
        beta1 = coef(regs)[1],
        beta2 = coef(regs)[2]
    )
mikebader
  • 1,075
  • 3
  • 12
  • 1
    This will work but FYI `dplyr::do()` has been [deprecated](https://dplyr.tidyverse.org/reference/do.html) since v1.0.0, i.e. June 2020, so it may stop working in the future. – SamR Oct 13 '22 at 16:12
0

Assuming the input shown in the Note at the end use lmList to run the regressions by player and then extract the coefficients. Omit the last line if it is OK to have player as the row names instead of a column.

library(nlme)
library(tibble)

fo <- log(score) ~ time_playing | player
df_players %>%
  lmList(fo, .) %>% 
  coef %>%
  rownames_to_column(var = "player")

giving:

  player (Intercept)  time_playing
1      a    1.678156  0.0059581851
2      b    1.732095 -0.0041131361
3      c    1.642926  0.0003907772

This code can be used for plotting the three regression curves and data.

library(lattice)
xyplot(fo, df_players, type = c("p", "r"), as.table = TRUE)

screenshot

Note

u <- "https://github.com/rhozon/datasets/raw/master/data_test_players.csv"
df_players <- read.csv2(u)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341