10

Assuming we have a data.table with a nested column val

dt <- data.table(
  grp = c(1, 2, 1, 3, 4),
  val = list("a", c("b", "c"), c("d", "e", "f"), "g", c("h", "i"))
)

which shows as

> dt
   grp   val
1:   1     a
2:   2   b,c
3:   1 d,e,f
4:   3     g
5:   4   h,i

Question

I would like to unnest the val column, where a possible option is using tidyr::unnest, i.e.,

> dt %>%
+   unnest(val)
# A tibble: 9 × 2
    grp val
  <dbl> <chr>
1     1 a
2     2 b
3     2 c
4     1 d
5     1 e
6     1 f
7     3 g
8     4 h
9     4 i

I am wondering if we can implement it by using the data.table only.

Watch out the order of values in the column grp, I would like to preserve the order like 1,2,1,3,4 rather than 1,1,2,3,4.


My data.table Attempt

My attempt is as below

> dt[, id := .I][, lapply(.SD, unlist), id][, id := NULL][]
   grp val
1:   1   a
2:   2   b
3:   2   c
4:   1   d
5:   1   e
6:   1   f
7:   3   g
8:   4   h
9:   4   i

or

> dt[,.(grp = rep(grp,lengths(val)), val = unlist(val))]
   grp val
1:   1   a
2:   2   b
3:   2   c
4:   1   d
5:   1   e
6:   1   f
7:   3   g
8:   4   h
9:   4   i

but I guess there might be some more concise and elegant way to do this, e.g., without creating an auxiliary column id or using rep + lengths.

Any idea? Much appreciated!

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • 1
    Links to several SO posts here: [Create an efficient unnest function](https://github.com/Rdatatable/data.table/issues/2146). Maybe some relevant? – Henrik Sep 13 '22 at 20:17
  • 1
    E.g. [How to 'unlist' a column in a data.table](https://stackoverflow.com/questions/44336733/how-to-unlist-a-column-in-a-data-table), with suggestions similar to those here. – Henrik Sep 17 '22 at 11:37
  • I really like the second solution (with rep and lengths). – nicola Sep 21 '22 at 13:37

4 Answers4

8

One more option:

dt[, .(grp, val = unlist(val)), by = .I][, !"I"]
#      grp    val
#    <num> <char>
# 1:     1      a
# 2:     2      b
# 3:     2      c
# 4:     1      d
# 5:     1      e
# 6:     1      f
# 7:     3      g
# 8:     4      h
# 9:     4      i

PS. If you are using data.table 1.4.2 or older by = .I won't work. Instead you can use:

dt[, .(grp, val = unlist(val)), by = 1:nrow(dt)][, !"nrow"]

PS2. I think your dt[,.(grp = rep(grp,lengths(val)), val = unlist(val))] is neater.

s_baldur
  • 29,441
  • 4
  • 36
  • 69
7

Another option without group by like this:

library(data.table)
dt[rep(1:.N, lengths(val))][, val:=unlist(dt$val)][]
#>    grp val
#> 1:   1   a
#> 2:   2   b
#> 3:   2   c
#> 4:   1   d
#> 5:   1   e
#> 6:   1   f
#> 7:   3   g
#> 8:   4   h
#> 9:   4   i

Created on 2022-09-16 with reprex v2.0.2

Quinten
  • 35,235
  • 5
  • 20
  • 53
1

This should work:

dt <- dt[, .(grp, unlist(val)), by=.(rleid(grp))][, rleid := NULL]
koolmees
  • 2,725
  • 9
  • 23
0

I guess this works:

dt[, lapply(.SD, unlist), by = .(grp)]

   grp val
1:   1   a
2:   1   d
3:   1   e
4:   1   f
5:   2   b
6:   2   c
7:   3   g
8:   4   h
9:   4   i
Arthur Welle
  • 586
  • 5
  • 15
  • 1
    It is a nice attempt, but the `grp` values should retain the same order as in the original `dt`, e.g., `1,2,1,3,4` rather than `1,1,2,3,4`. – ThomasIsCoding Sep 13 '22 at 13:05