1

I’ve been given a nested list where each sublist contains information about values of one specific data frame. The list has the following structure:

summary <- list(df1 = list(value1 = "phone", count = 11, ratio = 78.57, value2 = "mail", count = 13, ratio = 92.86, value3 = "zoom", count = 8, ratio = 57.14),
df2 = list(value4 = "yes", count = 4, ratio = 28.57, value5 = "no", count = 10, ratio = 71.43))

str(summary)
List of 2
 $ df1:List of 9
  ..$ value1: chr "phone"
  ..$ count : num 11
  ..$ ratio : num 78.6
  ..$ value2: chr "mail"
  ..$ count : num 13
  ..$ ratio : num 92.9
  ..$ value3: chr "zoom"
  ..$ count : num 8
  ..$ ratio : num 57.1
 $ df2:List of 6
  ..$ value4: chr "yes"
  ..$ count : num 4
  ..$ ratio : num 28.6
  ..$ value5: chr "no"
  ..$ count : num 10
  ..$ ratio : num 71.4

Here, summary[[1]] states that the value “phone” occurred eleven times in data frame 1 with a certain ratio, the value “mail” occurred 13 times and so on. The same applies for data frame 2 where “yes” was counted four times etc.

Now, i want to create a nested list, that summarizes for each sublist in summary the values, counts, and ratios. More precisely, each sublist of the result list should only consist of three elements value, count and ratio, each containing the values, corresponding counts and ratios. The desired list result should have the following structure:

result <- list(res_df1 = list(value = c("phone", "mail", "zoom"), count = c(11,13,8), ratio = c(78.57, 92.86, 57.14)),
res_df2 = list(value = c("yes", "no"), count = c(4, 10), ratio = c(28.57, 71.43)))

str(result)
List of 2
 $ res_df1:List of 3
  ..$ value: chr [1:3] "phone" "mail" "zoom"
  ..$ count: num [1:3] 11 13 8
  ..$ ratio: num [1:3] 78.6 92.9 57.1
 $ res_df2:List of 3
  ..$ value: chr [1:2] "yes" "no"
  ..$ count: num [1:2] 4 10
  ..$ ratio: num [1:2] 28.6 71.4

I’ve come up with a solution, that produces this outcome but it feels more like a workaround and not a quite nice R solution:

library(rlist)
result <- list()
for(i in 1:length(summary)){
    tmp <- (summary[[i]])
    value <- as.character(tmp[seq(1, length(tmp), 3)])
    count <- as.numeric(tmp[seq(2, length(tmp), 3)])
    ratio <- as.numeric(tmp[seq(3, length(tmp), 3)])
    df <- cbind.data.frame(value, count, ratio)
    result <- list.append(result, df)
}

I couldn’t come up with a working solution which contains for example a lapply approach or similar. Is there a nicer, more compact way to do this? Any suggestions are appreciated!

T L
  • 13
  • 2

1 Answers1

0

Have you tried something like that?

for (i in 1:3) {
   idx <- seq(i, length(summary[[1]]), 3)
   assign(paste0("new",i), lapply(summary[[1]][idx], "[[", 1) )
 }
data.table::rbindlist(list(new1, new2, new3))

If you do this for every sublist (perhaps exchanging the index i with a max_length instead of 3. This is a more R approach. We use lapply, [[, and in the end the very fast rbindlist from the data.table package (I put it in there just to highlight that it comes from this package)

That gives you:

   value1 value2 value3
1:  phone   mail   zoom
2:     11     13      8
3:  78.57  92.86  57.14

This rbindlist approach is also very scalable as you can see in this post:

How to rbind many (+1000) *.rds files fast

because it can bind a lot of files/lists much faster than append or rbind

Patrick Bormann
  • 729
  • 6
  • 16
  • thank you, that's a nice approach! I've added `use.names = FALSE` to avoid the error of missing column (names). – T L Mar 11 '21 at 20:48