Finding slope over time for each participant in a long-form data set

Question

I would like to compute the linear regression and maximum likelihood slopes for each participant. This fine response explains how to do that for wide-form data, but mine are "long-form" longitudinal data, similar enough to Singer & Willet's data on alcohol use among teens:

alcohol1 <- read.table("https://stats.idre.ucla.edu/stat/r/examples/alda/data/alcohol1_pp.txt", header=T, sep=",")

Where, to exemplify, I would like to determine the linear regression (OLS) and maximum likelihood (MLE) slopes for alcuse across age for each id within the alcohol1 data set.

Output can be either another data frame in which each id has a corresponding variable that is the slope for their values or a column added to the original alcohola data that is this slope for each instance of that participant.

Like Singer & Willet, my participants do not all have the same number of occurences and some missing data, so I would like to account that as well.

It's not clear to me *what* you want to model. What is your response, what is (are) the predictor(s)? Can you edit your post to include the explicit linear model? — Maurits Evers, May 06 '19 at 00:44
See answers like this: https://stackoverflow.com/a/33870137/1222578 . Beyond your question about how to achieve this, I think you should also look into mixed/hierarchical linear models with random slopes, rather than fitting completely separate regressions to each participant. — Marius, May 06 '19 at 00:44
Thank you, @Marius. I am not facile with `dplyr`--nor its use here. You are right that the "actual" analyses will focus on MLMs, but for this purpose my colleagues and I want to look more exploratively at individuals based on their slopes. — wes, May 06 '19 at 00:49
If you're not used to `dplyr` then some of the other answers to the same question might be better, e.g. the answer that suggests using `lme4::lmList` (it's from a mixed effects package but that particular function does standard regressions). — Marius, May 06 '19 at 00:52
So run a GLM for each id: E.g.: `alcohol1.id1 <- alcohol1[ which(alcohol1[$id == 1), ]` etc.? if I understand correctly, `lmList` is set up better for wide-form data. But I am not an expert on any of this! — wes, May 06 '19 at 00:59
`lmList` is set up for long-form data. In the linked example, `state` is equivalent to `id` in your data. — Marius, May 06 '19 at 01:07

score 3 · Accepted Answer · answered May 06 '19 at 00:59

3

If you aren't opposed to using tidyverse functions:

dat <- data.frame(list(id = c(rep("id1",3),rep("id2",3),rep("id3",3)),
                       age = rep(c(14, 15, 16), 3),
                       alc.use = pi + rnorm(.5,1, n = 9)))
head(dat)
   id age  alc.use
1 id1  14 3.887784
2 id1  15 5.388763
3 id1  16 3.348683
4 id2  14 3.624546
5 id2  15 4.494489
6 id2  16 5.103788

group_by(dat, id) %>% summarize(b0 = coef(lm(alc.use ~ age))[1],
                                b1 = coef(lm(alc.use ~ age))[2])

# A tibble: 3 x 3
  id        b0     b1
  <fct>  <dbl>  <dbl>
1 id1     8.25 -0.270
2 id2    -6.69  0.740
3 id3    21.1  -1.14

answered May 06 '19 at 00:59

Dij

1,318
1
7
13

Thank you, @Dij. Is there a way to allow for `id` with different numbers of repeated data? I.e., when difference `id`s have different number waves of `alcuse`? – wes May 06 '19 at 01:06
Yes, `group_by` handles that internally. After the grouping you could have a list of 4 time/age points for participant 1, 3 datapoints for the next participants, etc. When you call `summarise`, it will produce one row for that whole group regardless and does not matter how many rows it collapsed upon for each level of the grouping variable – Dij May 06 '19 at 01:16

Finding slope over time for each participant in a long-form data set

1 Answers1