Looping/sapply through nlme function

Question

I am trying to execute a loop with mixed-model effects with response variable changing. I came from here and here. I should say that I have tried sthg creating a function and then sapply or lapply (wihtout success)

I provide a small dataset (really small) just to represent my original database (much larger and similar to those of longitudinal studies)

data<- structure(list(paciente = structure(c(6134, 6099, 6457, 6164, 
6470, 6323, 6550, 6082, 6476, 6044, 6509, 6539, 6234, 6555, 6383, 
6127, 6507, 6513, 6486, 6080, 6101, 6007, 6023, 6516, 6001, 6198, 
6510, 6530, 6351, 6181), label = "Paciente", format.spss = "F6.0"), 
    edad_s1 = structure(c(70, 63, 61, 71, 67, 59, 63, 69, 67, 
    67, 67, 72, 65, 72, 63, 65, 60, 64, 56, 63, 57, 62, 72, 60, 
    72, 63, 72, 68, 66, 71), label = "Edad", format.spss = "F3.0"), 
    sexo_s1 = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 
    2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 
    2L, 1L, 2L, 1L, 2L, 2L), .Label = c("Hombre", "Mujer"), label = "Sexo", class = "factor"), 
    time = c(2, 1, 2, 1, 0, 0, 1, 0, 2, 1, 1, 0, 1, 2, 1, 2, 
    1, 2, 0, 1, 1, 0, 2, 1, 0, 2, 1, 2, 2, 0), grupo_int_v00 = structure(c(1L, 
    1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 
    2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), .Label = c("A", 
    "B"), label = "Grupo de intervención", class = "factor"), 
    peso1 = c(108, 80.4, 95, 75, 92.6, 90, 82.2, 94.4, 78, 71.3, 
    75.1, 83.5, 87.1, 63, 73, 98.5, 90.2, 81.3, 93.4, 79.8, 114.3, 
    110.9, 81.5, 88.5, 82.4, 88.3, 90, 73, 79, 94.8), cintura1 = c(127, 
    100.5, 103.5, 108, 115, 114.5, 95.5, 115, 101, 98, 99, 108.5, 
    105, 99, 104, 126, 114.2, 99, 110, 104.5, 120, 126, 111.5, 
    102, 117, 110, 125, 100, 104, 123), tasis2_e = c(156, 129, 
    131, 138, 167, 138, 115, 146, 119, 148, 130, 144, 115, 122, 
    135, 139, 128, 119, 138, 115, 138, 151, 151, NA, 137, 147, 
    124, 168, 152, 156), tadias2_e = c(70, 63, 80, 67, 76, 81, 
    57, 68, 69, 69, 68, 78, 61, 71, 54, 77, 63, 63, 92, 73, 80, 
    88, 84, NA, 79, 76, 62, 90, 87, 89), p17_total = c(10, 10, 
    5, 9, 9, 7, 15, 11, 6, 12, 11, 4, 9, 14, 9, 9, 11, 14, 6, 
    5, 10, 10, 9, 13, 12, 7, 11, 12, 7, 4), geaf_tot = c(1986.01, 
    1286.71, 1230.77, 1510.49, 839.16, 2144.52, 5361.31, 1678.32, 
    4055.94, 2601.4, 3363.64, 3076.92, 5342.66, 2769.23, 2601.4, 
    1693.24, 4055.94, 3146.85, 3916.08, 6405.59, 2442.89, 671.33, 
    867.13, 1585.08, 3153.85, 3188.81, 7986.01, 839.16, 7552.45, 
    2937.06), glucosa = c(127, 97, 95, 102, 119, 113, 109, 105, 
    93, 167, 85, 108, 122, 112, 113, 120, 100, 108, 100, 86, 
    129, 136, 98, 97, 130, 125, 109, 102, NA, 181), albumi = c(4.47, 
    4.82, 4.78, 4.22, 4.59, 4.5, 4.33, 4.87, 4.83, 4.98, 4.23, 
    4.77, 4.76, 4.98, 4.18, 4.51, 4.72, 4.87, 4.77, 4.61, 4.55, 
    4.77, 4.6, 4.59, 4.25, 4.71, 4.47, 4.54, NA, 4.63), coltot = c(157, 
    191, 276, 248, 248, 217, 187, 301, 173, 230, 258, 238, 231, 
    181, 183, 243, 223, 195, 237, 245, 164, 145, 199, 234, 178, 
    192, 201, 198, NA, 159), hdl = c(39, 50, 57, 59, 49, 44, 
    60, 98, 52, 73, 58, 44, 58, 60, 48, 46, 73, 58, 39, 47, 38, 
    45, 59, 56, 72, 34, 78, 62, NA, 54), ldl_calc = c(91, 124, 
    204, 133, 155, 140, 105, 162, 91, 141, 182, 173, 155, 107, 
    83, 150, 132, 124, NA, 167, 101, 88, 121, 160, 84, 130, 112, 
    120, NA, NA), trigli = c(137, 87, 74, 282, 219, 165, 112, 
    203, 149, 78, 89, 105, 91, 71, 259, 236, 92, 63, 447, 157, 
    123, 58, 94, 90, 112, 139, 53, 80, NA, 429), hba1c = c(6.57, 
    5.82, 5.68, 5.96, 6.11, 5.73, 5.48, 5.8, 5.6, 7.8, 5.21, 
    5.73, 6.1, 5.86, 6.37, 6.27, 5.22, 5.59, 5.47, 5.95, 6.96, 
    NA, 5.47, 4.99, NA, 6.25, 5.79, 5.79, NA, 6.54), i_hucpeptide = c(NA, 
    NA, 466.64, 838.61, 847.89, 1481.03, 819.65, NA, 1298.6, 
    NA, 564.59, 544.2, 755.73, 1057.83, 957.43, NA, 957.33, 1002.34, 
    1104, NA, NA, NA, NA, 594.6, NA, 815.82, 922.08, 628.54, 
    NA, 1591.01), i_hughrelin = c(NA, NA, 410.97, 553.65, 453, 
    352.44, 527.01, NA, 328.27, NA, 1668.41, 460.06, 1072.27, 
    260.24, 749.03, NA, 1327.91, 363.79, 524.53, NA, NA, NA, 
    NA, 1051.1, NA, 143.32, 1076.49, 1565.85, NA, 607.31), i_hugip = c(NA, 
    NA, 2.67, 2.67, 2.67, 2.67, 2.67, NA, 2.67, NA, 2.67, 2.67, 
    690.74, 1165.16, 2.67, NA, 2.67, 2.67, 2.67, NA, NA, NA, 
    NA, 2.67, NA, 2.67, 2.67, 2.67, NA, 2.67), i_huglp1 = c(NA, 
    NA, 127.66, 284.34, 200.13, 59.3, 234.84, NA, 503.42, NA, 
    103.9, 14.14, 71.6, 56.41, 75.13, NA, 161.36, 124.19, 220.52, 
    NA, NA, NA, NA, 14.14, NA, 112.57, 100.52, 237.55, NA, 470.91
    ), i_huglucagon = c(NA, NA, 333.79, 649.94, 726.99, 395.38, 
    610.5, NA, 434.42, NA, 502.4, 127.62, 268.23, 10.48, 428.15, 
    NA, 716.02, 238.95, 320.32, NA, NA, NA, NA, 10.48, NA, 238, 
    487.42, 297.6, NA, 495.16), i_huinsulin = c(NA, NA, 129.24, 
    270.98, 299.75, 730.82, 267.54, NA, 616.91, NA, 121.26, 85.34, 
    224.96, 247.48, 220.75, NA, 181.85, 341.25, 551.46, NA, NA, 
    NA, NA, 133.42, NA, 263.87, 279.45, 94.78, NA, 573.14), i_huleptin = c(NA, 
    NA, 3992.49, 17806.43, 8409.76, 11511.43, 2965.92, NA, 3223.08, 
    NA, 9018.79, 1039.45, 2613.33, 2128.98, 7307.89, NA, 13492.13, 
    2883.77, 4775.98, NA, NA, NA, NA, 2602.96, NA, 2829.59, 8511.92, 
    3528.77, NA, 11487.15), i_hupai1 = c(NA, NA, 997.29, 2499.25, 
    3085.25, 1909.44, 1730.55, NA, 3333.37, NA, 1424.3, 1857.71, 
    2578.46, 2268.52, 2222.97, NA, 2722.92, 1300.69, 2732.11, 
    NA, NA, NA, NA, 1204.36, NA, 2483.08, 2289.67, 1791.79, NA, 
    6595.54), i_huresistin = c(NA, NA, 3044.48, 5774.77, 3221.72, 
    4925.57, 5170.95, NA, 3683.64, NA, 4041.32, 6771.31, 5119.11, 
    9521.7, 3328.41, NA, 5061.65, 3773.39, 3039.39, NA, NA, NA, 
    NA, 4405.17, NA, 2577.84, 3433.82, 6802.94, NA, 6461.67), 
    i_huvisfatin = c(NA, NA, 302.3, 2083.46, 2989.72, 1118.7, 
    8.64, NA, 96.03, NA, 2209.51, 8.64, 1944.37, 1415.55, 678.33, 
    NA, 4349.56, 8.64, 410.1, NA, NA, NA, NA, 117, NA, 8.64, 
    2308.8, 228.53, NA, 1766.64), col_rema = c(27, 17, 15, 56, 
    44, 33, 22, 41, 30, 16, 18, 21, 18, 14, 52, 47, 18, 13, NA, 
    31, 25, 12, 19, 18, 22, 28, 11, 16, NA, NA), homa = c(NA, 
    NA, 5.053, 11.374, 14.679, 33.985, 12.001, NA, 23.61, NA, 
    4.242, 3.793, 11.294, 11.406, 10.265, NA, 7.484, 15.167, 
    22.694, NA, NA, NA, NA, 5.326, NA, 13.574, 12.535, 3.978, 
    NA, 42.691), i_pcr = c(NA, NA, 0.41, 0.82, NA, 2.08, 0.08, 
    NA, 0.1, NA, 0.38, 0.05, 0.04, 0.35, 0.2, NA, 0.98, 0.02, 
    NA, NA, NA, NA, NA, 0.2, NA, 0.1, 0.16, 0.16, NA, 2.93)), row.names = c(NA, 
-30L), class = c("tbl_df", "tbl", "data.frame"))

Afterwards I am defining my iteration and my variables database

ex<- subset(data[, 6:30])

for (i in 1:length(ex)) {
  var_1 <- ex[,i]
  var_1 <- unlist(var_1)
  lme_1 <- lme(var_1 ~ sexo_s1*peso1 + edad_s1 + p17_total + poly(time, 2)*grupo_int_v00,
             random = ~ poly(time, 2)|paciente, control=lmeControl(opt="optim"),
           data = dat_longer, subset = !is.na(var_1))

Error in model.frame.default(formula = ~time + var_1 + sexo_s1 + peso1 +  : 
  invalid type (list) for variable 'var_1'

I have tried unlisting/as.data.frame in before running the loop

for (i in 1:length(data)) {
  var_1 <- data[,i]
  var_1 <- unlist(var_1) #or as.data.frame(var_1)
    lme_1 <- lme(var_1 ~ sexo_s1*peso1 + edad_s1 + p17_total + poly(time, 2)*grupo_int_v00,
             random = ~ poly(time, 2)|paciente, control=lmeControl(opt="optim"),
           data = dat_longer, subset = !is.na(var_1))
}
Error in model.frame.default(formula = ~time + var_1 + sexo_s1 + peso1 +  : 
  variable lengths differ (found for 'var_1')

I have also tried to develop a new function to iterate over

lme_z <- function(z){
out <-  lme(z ~ sexo_s1*peso1 + edad_s1 + p17_total + poly(time, 2)*grupo_int_v00,
             random = ~ poly(time, 2)|paciente, control=lmeControl(opt="optim"),
           data = dat_longer, subset = !is.na(z))
  
}
Error

If there is some contribution to iterate in the response variable (I know Ben Bolker is an expert) Thanks in advance

Mikael Jagan · Answer 1 · 2022-03-01T19:14:05.683

If data is a data frame containing all of the variables that you use in your formula, including all of the responses that you want to consider, then you can do:

f <- function(resp) {
    fixed <- . ~ sexo_s1 * peso1 + edad_s1 + p17_total + poly(time, 2) * grupo_int_v00
    fixed[[2L]] <- as.name(resp)
    lme(fixed = fixed,
        random = ~poly(time, 2) | paciente, 
        data = data,
        subset = !is.na(data[[resp]]),
        control = lmeControl(opt = "optim"))
}

list_of_lme_objects <- lapply(names_of_response_variables, f)

An important piece is:

fixed <- . ~ sexo_s1 * peso1 + edad_s1 + p17_total + poly(time, 2) * grupo_int_v00
fixed[[2L]] <- as.name(resp)

The second statement injects the response named resp into the left hand side of the formula template. A more transparent example:

fixed <- . ~ world
fixed[[2L]] <- as.name("hello")
fixed
## hello ~ world

Another important piece is:

subset = !is.na(data[[resp]])

Here, the right hand side actually evaluates to a logical vector of length equal to the number of rows of data. You might consider passing na.action = na.omit instead of subset, though that will also omit rows where the independent variables have missing values, so the semantics are slightly different.

The variable grupo_int_v00 is missing from your data frame. You'll have to fix that on your end in order to test the code...

Ben Bolker · Answer 2 · 2022-03-08T19:29:40.650

I was going to suggest:

formvars <- c("sexo_s1*peso1",
              "edad_s1",
              "p17_total",
              "poly(time, 2)")
## excluded *grupo_int_v00 since not in example data frame
respvars <- names(df)[7:30]
result <- list()
for (r in respvars) {
  result[[r]] <- lme(reformulate(formvars, response = r),
               random = ~ poly(time, 2)|paciente,
               control=lmeControl(opt="optim"),
               data = df,
               na.action = na.exclude)
}

Many of @MikaelJagan's points are well taken. In particular:

grupo_int_v00 excluded since it wasn't in your example data set
this code doesn't work for your example since there are only two complete cases (i.e., observations with no missing predictors/responses) in the data set, so we can't fit a quadratic polynomial ("degree must be less than the number of unique points")
I used na.exclude, which obviates your subset argument; it excludes NA values when fitting but will re-introduce them e.g. in calculating predictions or residuals

I can't get this to work with your new example. (1) now that you added a column, `peso1` (a predictor variable) is column 6, so we have to start from column 7. (2) In the data set you've given, each patient only has one observation, so the model doesn't run. — Ben Bolker, Mar 08 '22 at 19:30

Looping/sapply through nlme function

2 Answers2