Unlist nested list columns in data.table

Question

Unlist nested list column in data.table. Assuming all the list elements are the same type. The list elements are named, the name has to be handled also.
It is somehow opposite operation to data.table aggregation to list column.
I think it is worth to have it in SO data.table knowledge base.
My current workaround approach below, I'm looking for a little bit more canonical answer.

library(data.table)
dt <- data.table(
    a = letters[1:3], 
    l = list(list(c1=6L, c2=4L), list(x=2L, y=4L, z=3L), list())
)
dt[]
#    a      l
# 1: a <list>
# 2: b <list>
# 3: c <list>
dt[,.(a = rep(a,length(l)),
      nm = names(unlist(l)),
      ul = unlist(l)),
   .(id = seq_along(a))
   ][, id := NULL
     ][]
#    a nm ul
# 1: a c1  6
# 2: a c2  4
# 3: b  x  2
# 4: b  y  4
# 5: b  z  3
# 6: c NA NA

Can't you just do `dt[, .(nm = names(unlist(l)), ul = unlist(l)), by = a]`? — David Arenburg, Jul 15 '15 at 12:15
The last row, which has an empty list, is not handled in that way. — jangorecki, Jul 15 '15 at 12:19
@AnandaMahto Hard to say now, in my use case I assume named list would not be empty but store some dummy `NA` or `integer()`. — jangorecki, Jul 15 '15 at 12:42
If your empty list is at the start this workaround will not work because `data.table` will not be able to determine the type of column for the result. — Simon O'Hanlon, Jul 15 '15 at 14:20
I'm curious what use case you have in mind when you say it's important to have this in the knowledge base..? Like `lm` class objects? I've only ever had list columns containing unnamed string vectors. — Frank, Jul 15 '15 at 15:15
@Frank more or less the same what `tables()` function do, but I want to collect more metadata. E.g. column types: `data.table(name = "dt", coltypes = list(list(col1="integer", col2="character")))`. @Simon good point. — jangorecki, Jul 15 '15 at 18:51
@jangorecki But the inner object does not need to be a list in that case. Doesn't this also serve the purpose? `data.table(name = "dt", coltypes = list(c(col1="integer", col2="character")))` — Frank, Jul 15 '15 at 18:54
for coltypes (`typeof`) yes, but for colclasses (`class`) no, as the single column can have multiple classes. Anyway that particular case is not part of the question. So yes in that question I could use `c` instead of `list`, of course then the last empty list would need to be 0 length integer. — jangorecki, Jul 15 '15 at 19:09
@Frank see related work [demo](https://rawgit.com/jangorecki/5664c3d90ec6213a63d5/raw/bb434646a1596b2110bf3bc8cd4edc28dd9940c0/information_schema.html), if you have any thoughts I would be glad to hear - GMTs chat. — jangorecki, Jul 17 '15 at 12:19

Cath · Accepted Answer · 2015-07-15T14:05:29.440

10

Not sure it is more "canonical" but here is a way to modify l so you can use by=a, considering you know the type of your data in list (with some improvements, thanks to @DavidArenburg):

dt[lengths(l) == 0, l := NA_integer_][, .(nm = names(unlist(l)), ul = unlist(l)), by = a]

#   a nm ul
#1: a c1  6
#2: a c2  4
#3: b  x  2
#4: b  y  4
#5: b  z  3
#6: c NA NA

edited Jul 15 '15 at 14:05

answered Jul 15 '15 at 13:43

Cath

23,906
5
52
86

Unlist nested list columns in data.table

1 Answers1

Linked