3

I have a data frame with post and follow-up measurements for approximately 200 people. In the study, we try to find out if there is a correlation between sports participation and distress symptoms. We have two measurement periods (post and follow-up) that are conducted after a workshop about health and sports. Post was conducted 6 months after the Workshop and followup one year after the workshop. We formed the following hypothesis: „Participation in sport for obese people within one year after a workshop correlates significantly positively with psychological distress symptoms at follow up.“ I assume, the dependent variable is psychological distress and the independent is the participation in sports activities. The data structure looks like:

Df
$ measurement_period : Factor w/ 2 levels "0","1": 1 1 1 1
$   psychological_distress ; int 12 45 32 85
$  participation : Factor w/ 2 levels "0","1": 1 1 1 1
$  id : num 1 2 3 4

After reading some posts here, we believe that there are 2 levels in the model: 1 ) measurement period (post and follow up) 2) id

At first we conductet a unconditiional Model (intercept only Model for confirming if a multilevel Model fits, hope that this is right) with following code:

test <-lmer(psychological_distress ~1+(1|id),data=Df

But we are not sure if the model is appropriate given the data structure and, whether the level 1 and level 2 classification is correct.

Thank you very much in advance!

Timon Doo
  • 143
  • 12

2 Answers2

2

Your model:

lmer(psychological_distress ~ 1 + (1|id) , data = Df)

is a variance components model. It will tell you how much of the variation in psychological_distress is attributable to the id level, and how much is attributable to the unit/residual level. That isn't going to answer your research question:

we try to find out if there is a correlation between sports participation and distress symptoms

To look into this, you need to include the participation variable as a fixed effect, and also the time variable, and their interaction. So in the first instance I would consider this:

lmer(psychological_distress ~ measurement_period*participation  + (1|id) , data = Df)
Robert Long
  • 5,722
  • 5
  • 29
  • 50
  • Thank you for your answer. I have one question, I'm not sure why you inserted the interaction between participation and measurement_period, is it necessary? Thank you in advance. – Timon Doo May 07 '22 at 20:02
  • 1
    You're welcome. That's a good question. It's quite possible that the interaction is not relevant in your particular case. I just tend to include it by default, in case there is a difference in the association, between groups . – Robert Long May 07 '22 at 20:33
2

A good website on how to fit longitudinal and growth models using lme4 is https://rpsychologist.com/r-guide-longitudinal-lme-lmer

As Robert pointed out, and as demonstrated on the website, it is often useful to fit an interaction between "time" and "group" (e.g., treatment vs. control), to see how the outcome changes for each group over time. You can see this change by looking at the coefficients, but it's usually easier to plot (adjusted) predictions.

Here's a toy example:

library(parameters)
library(datawizard)
library(lme4)
library(ggeffects)
data("qol_cancer")

# filter two time points
qol_cancer <- data_filter(qol_cancer, time %in% c(1, 2))

# create fake treatment/control variable
set.seed(123)
treatment <- sample(unique(qol_cancer$ID), size = length(unique(qol_cancer$ID)) / 2, replace = FALSE)
qol_cancer$treatment <- 0
qol_cancer$treatment[qol_cancer$ID %in% treatment] <- 1

qol_cancer$time <- as.factor(qol_cancer$time)
qol_cancer$treatment <- factor(qol_cancer$treatment, labels = c("control", "treatment"))

m <- lmer(QoL ~ time * treatment + (1 + time | ID), 
          data = qol_cancer,
          control = lmerControl(check.nobs.vs.nRE = "ignore"))

model_parameters(m)
#> # Fixed Effects
#> 
#> Parameter                        | Coefficient |   SE |         95% CI | t(368) |      p
#> ----------------------------------------------------------------------------------------
#> (Intercept)                      |       70.74 | 2.15 | [66.52, 74.97] |  32.90 | < .001
#> time [2]                         |        0.27 | 2.22 | [-4.10,  4.64] |   0.12 | 0.905 
#> treatment [treatment]            |        4.88 | 3.04 | [-1.10, 10.86] |   1.60 | 0.110 
#> time [2] * treatment [treatment] |        1.95 | 3.14 | [-4.23,  8.13] |   0.62 | 0.535 
#> 
#> # Random Effects
#> 
#> Parameter                 | Coefficient
#> ---------------------------------------
#> SD (Intercept: ID)        |       15.14
#> SD (time2: ID)            |        7.33
#> Cor (Intercept~time2: ID) |       -0.62
#> SD (Residual)             |       14.33
#> 
#> Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
#>   using a Wald t-distribution approximation.

ggpredict(m, c("time", "treatment")) |> plot()

Regarding the statistical significance of the interaction term: the p-values from the summary might be misleading. If you're really interested in statistically significant differences either between time points, or between groups (treatment vs. control), it is recommended to calculate pairwise contrasts including p-values. You can do this, e.g., with the emmeans-package.

library(emmeans)
emmeans(m, c("time", "treatment")) |> contrast(method = "pairwise", adjust = "none")
#>  contrast                          estimate   SE  df t.ratio p.value
#>  time1 control - time2 control       -0.266 2.22 186  -0.120  0.9049
#>  time1 control - time1 treatment     -4.876 3.04 186  -1.604  0.1105
#>  time1 control - time2 treatment     -7.092 2.89 316  -2.453  0.0147
#>  time2 control - time1 treatment     -4.610 2.89 316  -1.594  0.1118
#>  time2 control - time2 treatment     -6.826 2.73 186  -2.497  0.0134
#>  time1 treatment - time2 treatment   -2.216 2.22 186  -0.997  0.3199
#> 
#> Degrees-of-freedom method: kenward-roger

Created on 2022-05-22 by the reprex package (v2.0.1)

Here you can see, e.g., that treatment and control do not differ regarding their QoL at time point 1, but they do at time point 2.

Daniel
  • 7,252
  • 6
  • 26
  • 38
  • Nice example (+1) :) – Robert Long May 09 '22 at 04:23
  • Thank you both very much for the input and example, that helped a lot understanding MLM and my data. – Timon Doo May 11 '22 at 08:55
  • So, Daniel, in your case your output would mean that the effect of time as a predictor of QoL is not significant (p > .05), indicating that QoL does not change significantly over the two time periods. Regarding the interaction between time and treatment, you can say that it is not significant (p > .05) either, indicating that the outcome QoL does not change significantly for each group (both treatment and control group) over time. Am I right? – Timon Doo May 11 '22 at 08:55
  • I still have not gotten through the difference between a conditional growth model dropping random slope (lmer(QoL ~ time * treatment + (1 | ID), data=data)) and a conditional growth model adding random slope (lmer(QoL ~ time * treatment + (time | ID), data=data) How did you know in your data that you can get rid of a random slope? Or did you include it in your code by indicating (1 + time | ID)? – Timon Doo May 11 '22 at 08:55
  • I added an example regarding statistically significant differences. I hope that clarifies the question where we find significant changes in QoL either over time or between groups. – Daniel May 22 '22 at 17:33
  • > *Or did you include it in your code by indicating (1 + time | ID)* Yes, this means that I added "time" as random slope. For longitudinal data, it is feasible to assume that the possible amount of change in your outcome varies between persons. That's when you include "time" as random slope. I think a good website that explains this well is http://mfviz.com/hierarchical-models/ – Daniel May 22 '22 at 17:37