Problem
Documentation for readMat()
function says: "For the MAT v5 format, cell structures are read into R as a list structure."
This creates a problem here for me as I am not able to convert it back to the original table structure from the list object. In the original files I inherited, each row (rather than column) represents answers to different questionnaires (row1 = questionnaire1, row2 = questionnaire2, etc.), but the way readMat()
creates the list is vertically (by column), so my questionnaire items are basically all messed up.
Code for desired output
Here's code to reproduce a simplified example for the desired output and original file appearance in the Matlab
cell
structure:
list1 <- list("2", "34", "17", NA, NA, NA)
list2 <- list("32", "43", NA, NA, NA, NA)
list3 <- list("C", "D", "A", "F", "G", "I")
list4 <- list("455", NA, NA, NA, NA, NA)
df <- data.frame()
df <- rbind(df,list1,list2,list3,list4)
colnames(df) <- NULL
rownames(df) <- NULL
df
This outputs the following (DESIRED OUTPUT/ORIGINAL MATLAB STRUCTURE):
1 2 34 17 <NA> <NA> <NA>
2 32 43 <NA> <NA> <NA> <NA>
3 C D A F G I
4 455 <NA> <NA> <NA> <NA> <NA>
So I can select by row instead of having a messed up order of observations. Note that I replaced the NULL
values with NA
for this example else I had an error while making the data frame.
Code for undesired output
However, to reproduce the outcome of importing in R
from Matlab
with readMat()
we need hefty code like this:
list1 <- list(matrix("2"))
list2 <- list(matrix("32"))
list3 <- list(matrix("C"))
list4 <- list(matrix("455"))
list5 <- list(matrix("34"))
list6 <- list(matrix("43"))
list7 <- list(matrix("D"))
list8 <- NULL
list9 <- list(matrix("17"))
list10 <- NULL
list11 <- list(matrix("A"))
list12 <- NULL
list13 <- NULL
list14 <- NULL
list15 <- list(matrix("F"))
list16 <- NULL
list17 <- NULL
list18 <- NULL
list19 <- list(matrix("G"))
list20 <- NULL
list21 <- NULL
list22 <- NULL
list23 <- list(matrix("I"))
list24 <- NULL
(mylist <- list(list1, list2, list3, list4, list5,
list6, list7, list8, list9, list10,
list11, list12, list13, list14, list15,
list16, list17, list18, list19, list20,
list21, list22, list23, list24))
Which outputs the following:
[[1]]
[[1]][[1]]
[,1]
[1,] "2"
[[2]]
[[2]][[1]]
[,1]
[1,] "32"
[[3]]
[[3]][[1]]
[,1]
[1,] "C"
[[4]]
[[4]][[1]]
[,1]
[1,] "455"
[[5]]
[[5]][[1]]
[,1]
[1,] "34"
[[6]]
[[6]][[1]]
[,1]
[1,] "43"
[[7]]
[[7]][[1]]
[,1]
[1,] "D"
[[8]]
NULL
[[9]]
[[9]][[1]]
[,1]
[1,] "17"
[[10]]
NULL
[[11]]
[[11]][[1]]
[,1]
[1,] "A"
[[12]]
NULL
[[13]]
NULL
[[14]]
NULL
[[15]]
[[15]][[1]]
[,1]
[1,] "F"
[[16]]
NULL
[[17]]
NULL
[[18]]
NULL
[[19]]
[[19]][[1]]
[,1]
[1,] "G"
[[20]]
NULL
[[21]]
NULL
[[22]]
NULL
[[23]]
[[23]][[1]]
[,1]
[1,] "I"
[[24]]
NULL
So in other threads, most people said to unlist, but unlisting my list does not allow me to select questionnaires by row for instance (especially since NULL
values are not conserved in the dimensions when unlisting):
unlist(mylist)
[1] "2" "32" "C" "455" "34" "43" "D" "17" "A" "F" "G" "I"
You can see it's tidier but the items are not in the right order so it's hard to put them back into a data frame. Some said to transform into a matrix... which does not really resolve the problem:
matrix(unlist(mylist))
[,1]
[1,] "2"
[2,] "32"
[3,] "C"
[4,] "455"
[5,] "34"
[6,] "43"
[7,] "D"
[8,] "17"
[9,] "A"
[10,] "F"
[11,] "G"
[12,] "I"
I've tried other solutions from the threads to no avail, e.g.:
do.call(rbind.data.frame, mylist) # doesn't work
as.data.frame(matrix(unlist(mylist),nrow=length(mylist),byrow=TRUE)) # doesn't work
Here are some related threads: 1, 2, 3, 4, 5, 6, 7, and 8.
Question
Why is it necessary for
readMat()
to importMAT v5
format cell structures as lists rather than data frames (it would save us so much trouble)?I'm looking for a solution ideally in base
R
to transform thereadMat()
list object to a data frame, that I could automatize assuming I have thousands such files that I'm not going to edit, restructure, or save to a different format individually in Matlab, and assuming the number and location ofNULL
values vary, as well as the length of each row (some questionnaires have more items than others). Thanks!