9

Unlist nested list column in data.table. Assuming all the list elements are the same type. The list elements are named, the name has to be handled also.
It is somehow opposite operation to data.table aggregation to list column.
I think it is worth to have it in SO knowledge base.
My current workaround approach below, I'm looking for a little bit more canonical answer.

library(data.table)
dt <- data.table(
    a = letters[1:3], 
    l = list(list(c1=6L, c2=4L), list(x=2L, y=4L, z=3L), list())
)
dt[]
#    a      l
# 1: a <list>
# 2: b <list>
# 3: c <list>
dt[,.(a = rep(a,length(l)),
      nm = names(unlist(l)),
      ul = unlist(l)),
   .(id = seq_along(a))
   ][, id := NULL
     ][]
#    a nm ul
# 1: a c1  6
# 2: a c2  4
# 3: b  x  2
# 4: b  y  4
# 5: b  z  3
# 6: c NA NA
Community
  • 1
  • 1
jangorecki
  • 16,384
  • 4
  • 79
  • 160
  • Can't you just do `dt[, .(nm = names(unlist(l)), ul = unlist(l)), by = a]`? – David Arenburg Jul 15 '15 at 12:15
  • 1
    The last row, which has an empty list, is not handled in that way. – jangorecki Jul 15 '15 at 12:19
  • @jangorecki, Are empty lists also always unnamed? – A5C1D2H2I1M1N2O1R2T1 Jul 15 '15 at 12:34
  • @AnandaMahto Hard to say now, in my use case I assume named list would not be empty but store some dummy `NA` or `integer()`. – jangorecki Jul 15 '15 at 12:42
  • 2
    If your empty list is at the start this workaround will not work because `data.table` will not be able to determine the type of column for the result. – Simon O'Hanlon Jul 15 '15 at 14:20
  • I'm curious what use case you have in mind when you say it's important to have this in the knowledge base..? Like `lm` class objects? I've only ever had list columns containing unnamed string vectors. – Frank Jul 15 '15 at 15:15
  • 1
    @Frank more or less the same what `tables()` function do, but I want to collect more metadata. E.g. column types: `data.table(name = "dt", coltypes = list(list(col1="integer", col2="character")))`. @Simon good point. – jangorecki Jul 15 '15 at 18:51
  • @jangorecki But the inner object does not need to be a list in that case. Doesn't this also serve the purpose? `data.table(name = "dt", coltypes = list(c(col1="integer", col2="character")))` – Frank Jul 15 '15 at 18:54
  • for coltypes (`typeof`) yes, but for colclasses (`class`) no, as the single column can have multiple classes. Anyway that particular case is not part of the question. So yes in that question I could use `c` instead of `list`, of course then the last empty list would need to be 0 length integer. – jangorecki Jul 15 '15 at 19:09
  • @Frank see related work [demo](https://rawgit.com/jangorecki/5664c3d90ec6213a63d5/raw/bb434646a1596b2110bf3bc8cd4edc28dd9940c0/information_schema.html), if you have any thoughts I would be glad to hear - GMTs chat. – jangorecki Jul 17 '15 at 12:19

1 Answers1

10

Not sure it is more "canonical" but here is a way to modify l so you can use by=a, considering you know the type of your data in list (with some improvements, thanks to @DavidArenburg):

dt[lengths(l) == 0, l := NA_integer_][, .(nm = names(unlist(l)), ul = unlist(l)), by = a]

#   a nm ul
#1: a c1  6
#2: a c2  4
#3: b  x  2
#4: b  y  4
#5: b  z  3
#6: c NA NA
Cath
  • 23,906
  • 5
  • 52
  • 86