0

I have a data.table in R where one column C contains vectors, e.g. c('a', 'b', 'c'). I want to expand the vector into rows in the data.table, so that for the value c('a', 'b', 'c') in column C, it will generate three rows with the other columns copied and the value in column C being a, b, and c respectively.

library(data.table)

DT <- data.table(
  A = c(1, 2, 3),
  B = c("abc", "bbc", "ccc"),
  C = list(c("a", "b", "c"), c("d", "e"), c("f"))
)
DT
   A   B     C # in real data, there are many cols
1: 1 abc a,b,c
2: 2 bbc   d,e
3: 3 ccc     f

my expacted:

   A   B C
1: 1 abc a
2: 1 abc b
3: 1 abc c
4: 2 bbc d
5: 2 bbc e
6: 3 ccc f

I find a code with tidyverse from expanding R data.table by splitting on one column , but I dont want to include extra library if possible.

How can I achieve this using data.table syntax in R?

zhang
  • 185
  • 7
  • 1
    your reproducible example doesn't match your desired output – Mark Aug 19 '23 at 11:56
  • 1
    Long-running discussion among the maintainers on this, see [#2146](https://github.com/Rdatatable/data.table/issues/2146) and [#3672](https://github.com/Rdatatable/data.table/issues/3672). One strong opinion so far is that when `unlist` (as in both Mark's and s_baldur's answers) does not suffice, then `tidyr::unnest` is performant and robust. In this example, `tidyr::unnest(DT, C)` works well, and handles many problems one might see. If you're avoiding package bloat (a normal effort), then I think you have your answers :-) – r2evans Aug 19 '23 at 18:08
  • @Mark I have update the typo, thank you, @r2evans sorry for repeat question, I think that because I use `expand` as key word but not `unlist`, and thanks your suggestion – zhang Aug 21 '23 at 02:51

3 Answers3

1

One option:

DT[, c(.SD[rep(seq_len(.N), lengths(C))], C = .(unlist(C))), .SDcols = !"C"]
#        A      B      C
#    <num> <char> <char>
# 1:     1    foo      a
# 2:     1    foo      b
# 3:     1    foo      c
# 4:     2    bar      d
# 5:     2    bar      e
# 6:     3    baz      f
s_baldur
  • 29,441
  • 4
  • 36
  • 69
1
DT[, .(A, B, C = unlist(C)), by = .I]
Mark
  • 7,785
  • 2
  • 14
  • 34
1

If you are working with data.table, try

> DT[,lapply(.SD, unlist), A:B]
   A   B C
1: 1 foo a
2: 1 foo b
3: 1 foo c
4: 2 bar d
5: 2 bar e
6: 3 baz f
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81