I have a repeated measures sample, where each participant was asked to complete a sleep survey over the course of 5 years (baseline though year 4 of follow-up). Each survey item is fairly correlated (e.g. when you go to bed correlates with your duration of sleep) and so we are interested in taking a PCA-like approach and use loadings on each PC to create a time-varying composite score (e.g. a composite score based on these hypothetical "PC" "loadings" for each time point). Then we want to take each participant's time-varying composite measure in a mixed model to predict our longitudinal outcome of interest.
We initially performed PCA on all of the data (all participants and across all time points), but there are a few assumptions here. But, upon further reflection, I started to question if the PCA is able to distinguish between participant- or time-based variability. And so, I am looking for a way to perform a similar dimensionality reduction approach on a repeated measures sample.
Based on some previous stack questions I found, it looks like MFA might be a good option. But all of the examples I see online don't include longitudinal analysis.
1. Does MFA seem like the correct approach?
2. And if so, is the following code correct for library(FactoMineR)
Below is a sample dataset illustrating the structure and code I think I'd run:
library(FactoMineR)
set.seed(123)
ex_dat <- data.frame(ID = rep(1:4, each=4),
visit = rep(c("baseline", "y1", "y2", "y3"), 4),
var1 = rnorm(16),
var2 = rnorm(16)^2,
var3 = log(rnorm(16, mean=3, sd=1)))
dat <- ex_dat %>% pivot_wider(id_cols = ID,
names_from = visit,
values_from = c("var1", "var2", "var3")) %>% data.frame()
> dat
ID var1_baseline var1_y1 var1_y2 var1_y3 var2_baseline var2_y1 var2_y2 var2_y3 var3_baseline var3_y1 var3_y2 var3_y3
1 1 -0.5604756 -0.2301775 1.5587083 0.07050839 0.2478551 3.86758304 0.4919001 0.22353172 1.3597259 1.3553540 1.3406642 1.3052579
2 2 0.1292877 1.7150650 0.4609162 -1.26506123 1.1402475 0.04751306 1.0526851 0.53128242 1.2680506 1.0777591 0.9910409 0.9629945
3 3 -0.6868529 -0.4456620 1.2240818 0.35981383 0.3906741 2.84493432 0.7018871 0.02352331 0.8352078 1.0267878 0.5507789 1.6426707
4 4 0.4007715 0.1106827 -0.5558411 1.78691314 1.2953557 1.57205186 0.1818717 0.08706718 1.4369784 0.6296169 0.9544013 0.9295404
So, for each participant I have multiple measurements of variables 1 through 3.
My hunch, based on the MFA manual would be to run code like this to perform MFA. This, I'm guessing, assumes that all var1
, var2
, var3
variables are in their own "group".
# MFA Analysis
res_MFA <- MFA(dat[, -1], group=rep(4, 3), type=rep("s", 3))
And lastly... does res_MFA$ind$coord
give me the equivalent of a "loading" for each dimension?