4

I have a repeated measures sample, where each participant was asked to complete a sleep survey over the course of 5 years (baseline though year 4 of follow-up). Each survey item is fairly correlated (e.g. when you go to bed correlates with your duration of sleep) and so we are interested in taking a PCA-like approach and use loadings on each PC to create a time-varying composite score (e.g. a composite score based on these hypothetical "PC" "loadings" for each time point). Then we want to take each participant's time-varying composite measure in a mixed model to predict our longitudinal outcome of interest.

We initially performed PCA on all of the data (all participants and across all time points), but there are a few assumptions here. But, upon further reflection, I started to question if the PCA is able to distinguish between participant- or time-based variability. And so, I am looking for a way to perform a similar dimensionality reduction approach on a repeated measures sample.

Based on some previous stack questions I found, it looks like MFA might be a good option. But all of the examples I see online don't include longitudinal analysis.

1. Does MFA seem like the correct approach?

2. And if so, is the following code correct for library(FactoMineR)

Below is a sample dataset illustrating the structure and code I think I'd run:

library(FactoMineR)
set.seed(123)
ex_dat <- data.frame(ID = rep(1:4, each=4),
                     visit = rep(c("baseline", "y1", "y2", "y3"), 4),
                     var1 = rnorm(16),
                     var2 = rnorm(16)^2,
                     var3 = log(rnorm(16, mean=3, sd=1)))
dat <- ex_dat %>% pivot_wider(id_cols = ID, 
                                 names_from = visit,
                                 values_from = c("var1", "var2", "var3")) %>% data.frame()
> dat
  ID var1_baseline    var1_y1    var1_y2     var1_y3 var2_baseline    var2_y1   var2_y2    var2_y3 var3_baseline   var3_y1   var3_y2   var3_y3
1  1    -0.5604756 -0.2301775  1.5587083  0.07050839     0.2478551 3.86758304 0.4919001 0.22353172     1.3597259 1.3553540 1.3406642 1.3052579
2  2     0.1292877  1.7150650  0.4609162 -1.26506123     1.1402475 0.04751306 1.0526851 0.53128242     1.2680506 1.0777591 0.9910409 0.9629945
3  3    -0.6868529 -0.4456620  1.2240818  0.35981383     0.3906741 2.84493432 0.7018871 0.02352331     0.8352078 1.0267878 0.5507789 1.6426707
4  4     0.4007715  0.1106827 -0.5558411  1.78691314     1.2953557 1.57205186 0.1818717 0.08706718     1.4369784 0.6296169 0.9544013 0.9295404

So, for each participant I have multiple measurements of variables 1 through 3.

My hunch, based on the MFA manual would be to run code like this to perform MFA. This, I'm guessing, assumes that all var1, var2, var3 variables are in their own "group".

# MFA Analysis
res_MFA <- MFA(dat[, -1], group=rep(4, 3), type=rep("s", 3))

And lastly... does res_MFA$ind$coord give me the equivalent of a "loading" for each dimension?

Jess G
  • 188
  • 9
  • 1
    Would using a library that can construct a composite PCA score be helpful? You can make an elbow plot to check how much variance is captured by the PCs and then proceed. – Death Metal Aug 17 '20 at 16:27
  • 3
    @DeathMetal thanks for your comment! I've performed PCA on the datasets in long form (throwing all participants and all time points into the PCA) and have looked at the elbow -- there's not a huge kink, unfortunately. But, my main concern is if I should be worried about the fact that there are multiple rows for each participant). I'm definitely interested in a composite PCA score, but just want to make sure that the variability that accurately accounting for the sort of pseudo-replication problem I have here. – Jess G Aug 17 '20 at 17:37
  • 1
    I see; we need a library that constructs a composite PCA for replicates. There's an R library that constructs composite PCA score. Not sure if allows replicates part of the problem you're facing. Second, for the kink in the PCs, I think you can use the PCs until the variance decay gets saturated. No harm in capturing N number of PCs as long as variance is captured. – Death Metal Aug 17 '20 at 17:43
  • 1
    @DeathMetal good point! :) Thank you for your input! – Jess G Aug 17 '20 at 17:55

0 Answers0