How do you iterate a command over a range of sequentially named columns in R in a multidimensional object (i.e. named list)

Question

I'm working with the mice (multiple imputations) library. When it executes, it generates an object which is essentially an array of data frames, one for each column of the original data set on which you're performing the calculation. Each of these data frames contains the results of each "iteration" of the multiple imputation run.

Sample Data:

X1  X2  X3  X4  X5  X6  X7
NA  34  NA  -13 -33 NA  -8
-33 -15 -20 NA  -11 -40 NA
NA  -23 -9  12  -32 -9  -25
NA  6   -21 4   -42 -41 6
-4  NA  -9  4   NA  -20 -1
-14 -4  -8  NA  -44 -12 -6
-11 NA  -6  -3  NA  -19 NA
NA  -19 59  19  NA  NA  -31
17  NA  NA  -6  -46 -27 5
-3  -20 NA  27  NA  NA  -13

To perform the multiple imputations on this data set, one can use the following, which will generate m=5 imputation runs:

install.packages("mice")
library(mice)
imputedData <- mice(sample.data,m=5,maxit=50,meth='pmm',seed=100, print=FALSE)

To inspect the 5 sets of imputed values for one of the columns, say X1, can use this:

impData$imp$X1

With this object that's been created, I would like to compute the median for each of these imputations...

To do the median calculation for an individual column, I used:

impData$imp$X1$'med' <- apply(impData$imp$X1, 1, FUN = median)

That works fine for one column of my data set. But if I want to write a loop where I iterate from i = 1 to N (for all columns), I am not sure how to substitute for "X1" in the above equation the appropriate term based on "i" that will work. I think of it as indirect referencing, but am not sure of the syntax. In my example here, I could just do each of them manually since there's only 7 columns in the original data set, but in my real data (not shared here for brevity), I have 51 columns.

Note: I read some examples of using assign to create strings in loops, but I still could not figure out how to then stick that constructed string into the formula above. Perhaps assign is first step, and then I just need the syntax for second step, or perhaps there's another approach that accomplishes both in a single go. — rsilverst, Dec 15 '19 at 02:53
Don't use `assign`, deal with things in a `list` using `lapply` or such. Please don't include images of sample data (https://meta.stackoverflow.com/a/285557). Please provide a reproducible problem (https://stackoverflow.com/questions/5963269). — r2evans, Dec 15 '19 at 03:21
Per your suggestion, I updated the post to remove the image, add the sample date directly, and the full set of R code to reproduce the case. Would still welcome any more specific help you can provide, as I still haven't made progress. Thanks. — rsilverst, Dec 16 '19 at 05:32

score 0 · Answer 1 · answered Dec 16 '19 at 20:35

A colleague provided an efficient answer to this problem, so I will share it here in case others eventually have same question:

.MedianWithData <- function(dat) {
  dat$med <- apply(dat, 1, FUN = median)
  return(dat)
}

impDataMean$imp <- sapply(impDataMean$imp, .MedianWithData, simplify = FALSE)

How do you iterate a command over a range of sequentially named columns in R in a multidimensional object (i.e. named list)

1 Answers1