Any more concise `data.table` way to unnest a nested column in data.table?

Question

Assuming we have a data.table with a nested column val

dt <- data.table(
  grp = c(1, 2, 1, 3, 4),
  val = list("a", c("b", "c"), c("d", "e", "f"), "g", c("h", "i"))
)

which shows as

> dt
   grp   val
1:   1     a
2:   2   b,c
3:   1 d,e,f
4:   3     g
5:   4   h,i

Question

I would like to unnest the val column, where a possible option is using tidyr::unnest, i.e.,

> dt %>%
+   unnest(val)
# A tibble: 9 × 2
    grp val
  <dbl> <chr>
1     1 a
2     2 b
3     2 c
4     1 d
5     1 e
6     1 f
7     3 g
8     4 h
9     4 i

I am wondering if we can implement it by using the data.table only.

Watch out the order of values in the column grp, I would like to preserve the order like 1,2,1,3,4 rather than 1,1,2,3,4.

My `data.table` Attempt

My attempt is as below

> dt[, id := .I][, lapply(.SD, unlist), id][, id := NULL][]
   grp val
1:   1   a
2:   2   b
3:   2   c
4:   1   d
5:   1   e
6:   1   f
7:   3   g
8:   4   h
9:   4   i

or

> dt[,.(grp = rep(grp,lengths(val)), val = unlist(val))]
   grp val
1:   1   a
2:   2   b
3:   2   c
4:   1   d
5:   1   e
6:   1   f
7:   3   g
8:   4   h
9:   4   i

but I guess there might be some more concise and elegant way to do this, e.g., without creating an auxiliary column id or using rep + lengths.

Any idea? Much appreciated!

Links to several SO posts here: [Create an efficient unnest function](https://github.com/Rdatatable/data.table/issues/2146). Maybe some relevant? — Henrik, Sep 13 '22 at 20:17
E.g. [How to 'unlist' a column in a data.table](https://stackoverflow.com/questions/44336733/how-to-unlist-a-column-in-a-data-table), with suggestions similar to those here. — Henrik, Sep 17 '22 at 11:37

s_baldur · Accepted Answer · 2022-09-13T13:59:41.260

8

One more option:

dt[, .(grp, val = unlist(val)), by = .I][, !"I"]
#      grp    val
#    <num> <char>
# 1:     1      a
# 2:     2      b
# 3:     2      c
# 4:     1      d
# 5:     1      e
# 6:     1      f
# 7:     3      g
# 8:     4      h
# 9:     4      i

PS. If you are using data.table 1.4.2 or older by = .I won't work. Instead you can use:

dt[, .(grp, val = unlist(val)), by = 1:nrow(dt)][, !"nrow"]

PS2. I think your dt[,.(grp = rep(grp,lengths(val)), val = unlist(val))] is neater.

edited Sep 13 '22 at 13:59

answered Sep 13 '22 at 13:48

s_baldur

29,441
4
36
69

For some reason, I cannot reproduce your output – Maël Sep 13 '22 at 13:51
The val column is correct, but the grp is all messed up: ` grp val 1: 1 a 2: 2 b 3: 1 c 4: 3 d 5: 4 e 6: 1 f 7: 2 g 8: 1 h 9: 3 i` – Maël Sep 13 '22 at 13:53
Would you mind adding every step? maybe it's just me – Maël Sep 13 '22 at 13:53
I have 1.14.2. Let's see if it works for other people; it is not my question after all; it's not a big deal if it does not work only on my session – Maël Sep 13 '22 at 13:57
1

I think found out the difference and added a note. – s_baldur Sep 13 '22 at 14:00
Nice solution of using `by = .I`! – ThomasIsCoding Sep 13 '22 at 19:18
same as @Maël, with data.table 1.14.2 I do not find the proper grp order, and have a warning message `Item 1 has 5 rows but longest item has 9; recycled with remainder` – denis Sep 20 '22 at 07:01

score 7 · Answer 2 · answered Sep 16 '22 at 16:02

Another option without group by like this:

library(data.table)
dt[rep(1:.N, lengths(val))][, val:=unlist(dt$val)][]
#>    grp val
#> 1:   1   a
#> 2:   2   b
#> 3:   2   c
#> 4:   1   d
#> 5:   1   e
#> 6:   1   f
#> 7:   3   g
#> 8:   4   h
#> 9:   4   i

^{Created on 2022-09-16 with reprex v2.0.2}

score 1 · Answer 3 · answered Sep 20 '22 at 10:05

1

This should work:

dt <- dt[, .(grp, unlist(val)), by=.(rleid(grp))][, rleid := NULL]

answered Sep 20 '22 at 10:05

koolmees

2,725
9
23

score 0 · Answer 4 · answered Sep 13 '22 at 13:02

0

I guess this works:

dt[, lapply(.SD, unlist), by = .(grp)]

   grp val
1:   1   a
2:   1   d
3:   1   e
4:   1   f
5:   2   b
6:   2   c
7:   3   g
8:   4   h
9:   4   i

answered Sep 13 '22 at 13:02

Arthur Welle

586
5
15

1

It is a nice attempt, but the `grp` values should retain the same order as in the original `dt`, e.g., `1,2,1,3,4` rather than `1,1,2,3,4`. – ThomasIsCoding Sep 13 '22 at 13:05

Any more concise `data.table` way to unnest a nested column in data.table?

Question

My data.table Attempt

4 Answers4

My `data.table` Attempt