8

I have a nested list; for some indices, some variables are missing.

[[1]]
    sk   ques   pval 
  "10" "sfsf" "0.05" 

[[2]]
    sk   ques   pval   diff 
 "24" "wwww" "0.11"  "0.3" 

[[3]]
    sk   ques   pval   diff    imp 
  "24" "wwww" "0.11"  "0.3"    "2" 

How can I convert this to data frame, where for the first row, data$diff[1] = NA? Above case will be data frame with 5 variables and 3 observations.

The number of variables in the data frame will be number of unique names in list elements, and missing values inside the list will be replaced with NA's.

Thank you,

EDIT : Data format

list(structure(c("10", "sfsf", "0.05"), .Names = c("sk", "ques", 
"pval")), structure(c("24", "wwww", "0.11", "0.3"), .Names = c("sk", 
"ques", "pval", "diff")), structure(c("24", "wwww", "0.11", "0.3", 
"2"), .Names = c("sk", "ques", "pval", "diff", "imp")))
won782
  • 696
  • 1
  • 4
  • 13
  • Inside each list element, are those vectors or data frames? They seem like named vectors. Could you please post the output of `dput(head(list, 3))` – Rich Scriven Nov 26 '14 at 16:21
  • Good catch @RichardScriven. I had assumed they were proper data.frames. You can still use `rbind.fill` if you do the conversion: `rbind.fill(lapply(mydata, function(x)as.data.frame(t(x))))` – MrFlick Nov 26 '14 at 16:32

1 Answers1

23

We get the length of list element ('indx') by looping with sapply. In the recent version of R, we can use lengths to replace the sapply(.., length) step. We change the length of each element to the max length from the 'indx' (length<-) and thereby pad NA values at the end of the list elements with length less than the max length. We can rbind the list elements, convert to data.frame and change the column names.

 indx <- sapply(lst, length)
 #indx <- lengths(lst) 
 res <- as.data.frame(do.call(rbind,lapply(lst, `length<-`,
                          max(indx))))

 colnames(res) <- names(lst[[which.max(indx)]])
 res
 # sk ques pval diff  imp
 #1 10 sfsf 0.05 <NA> <NA>
 #2 24 wwww 0.11  0.3 <NA>
 #3 24 wwww 0.11  0.3    2

data

 lst <- list(structure(c("10", "sfsf", "0.05"), .Names = c("sk", "ques", 
 "pval")), structure(c("24", "wwww", "0.11", "0.3"), .Names = c("sk", 
 "ques", "pval", "diff")), structure(c("24", "wwww", "0.11", "0.3", 
 "2"), .Names = c("sk", "ques", "pval", "diff", "imp")))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • When I try this solution, I get `Error in `row.names<-.data.frame`(`*tmp*`, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique value when setting 'row.names': ‘1’ ` at the point where `res` is defined. I'm not sure why that is happening. – jessi Aug 29 '16 at 05:31
  • @jessi If you are assigning duplicated row names to `data.frame` it will not work as `data.frame` can take only unique row names, but the duplicate row names for matrix is okay.i.e without the `as.data.frame` – akrun Aug 29 '16 at 05:33
  • @akrun, FYI tested this today, my resulting data frame is given colnames without calling `colnames(res) <- names(lst([[which.max(indx)]])` – msoderstrom Jul 29 '18 at 11:15
  • @MartinSöderström Based on the example in my post, before calling the last line of code, 'V4' and 'V5' are the column names for the 4th and 5th column which is changed with `colnames(res) <-` – akrun Jul 29 '18 at 16:53
  • @akrun What if it is nested list? [[1]] sk ques pval [[act]] "10" "sfsf" "0.05" "time" – Rudr Oct 04 '18 at 16:55
  • @Rudr You may need a nested lapply – akrun Oct 04 '18 at 16:57