5

I am loading some JSON data using jsonlite which is resulting in some nested data similar (in structure) to the toy data.table dt constructed below. I want to be able to use rbindlist to bind the nested data.tables together.

Setup:

> dt <- data.table(a=c("abc", "def", "ghi"), b=runif(3))
> dt[, c:=list(list(data.table(d=runif(4), e=runif(4))))]
> dt
     a         b            c
1: abc 0.2623218 <data.table>
2: def 0.7092507 <data.table>
3: ghi 0.2795103 <data.table>

Using the NSE built into data.table, I can do:

> rbindlist(dt[, c])
            d          e
 1: 0.8420476 0.26878325
 2: 0.1704087 0.59654706
 3: 0.6023655 0.42590380
 4: 0.9528841 0.06121386
 5: 0.8420476 0.26878325
 6: 0.1704087 0.59654706
 7: 0.6023655 0.42590380
 8: 0.9528841 0.06121386
 9: 0.8420476 0.26878325
10: 0.1704087 0.59654706
11: 0.6023655 0.42590380
12: 0.9528841 0.06121386

which is exactly what I expect/want. Furthermore, the original dt remains unmodified:

> dt
     a         b            c
1: abc 0.2623218 <data.table>
2: def 0.7092507 <data.table>
3: ghi 0.2795103 <data.table>

However, when manipulating the data.table within a function I generally want to use get with string column names:

> rbindlist(dt[, get("c")])
           V1         V2
 1: 0.8420476 0.26878325
 2: 0.1704087 0.59654706
 3: 0.6023655 0.42590380
 4: 0.9528841 0.06121386
 5: 0.8420476 0.26878325
 6: 0.1704087 0.59654706
 7: 0.6023655 0.42590380
 8: 0.9528841 0.06121386
 9: 0.8420476 0.26878325
10: 0.1704087 0.59654706
11: 0.6023655 0.42590380
12: 0.9528841 0.06121386

Now the column names have been lost and replaced by the default "V1" and "V2" values. Is there a way to retain the names?

In the development version (v1.9.5) the problem is worse than simply lost names though. After executing the statement: rbindlist(dt[, get("c")]) the entire data.table becomes corrupt:

> dt
Error in FUN(X[[3L]], ...) : 
  Invalid column: it has dimensions. Can't format it. If it's the result of data.table(table()), use as.data.table(table()) instead.

To be clear, the lost names issue happens in both v1.9.4 (installed from CRAN) and v1.9.5 (installed from github), but the corrupt data.table issue seems to affect v1.9.5 only (as of today - July 8, 2015).

If I were able to stick with the NSE version of things everything runs smoothly. My issue is that sticking with the NSE version would involve writing multiple NSE functions calling each other which seems to get messy pretty fast.

Are there any (non-NSE-based) known work-arounds? Also, is this a known issue?

Matt Pollock
  • 1,063
  • 10
  • 26

1 Answers1

0

This must have been fixed in last 5 years since this Q was asked. Now I am getting expected results.

> library(data.table)
data.table 1.13.3 IN DEVELOPMENT built 2020-11-17 18:11:47 UTC; jan using 4 threads (see ?getDTthreads).  Latest news: r-datatable.com
> dt <- data.table(a=c("abc", "def", "ghi"), b=runif(3))
> dt[, c:=list(list(data.table(d=runif(4), e=runif(4))))]
> dt
     a         b                 c
1: abc 0.2416624 <data.table[4x2]>
2: def 0.0222938 <data.table[4x2]>
3: ghi 0.3510681 <data.table[4x2]>
> rbindlist(dt[, c])
            d          e
 1: 0.5485731 0.32366420
 2: 0.5457945 0.45173251
 3: 0.6796699 0.03783026
 4: 0.4442776 0.03121024
 5: 0.5485731 0.32366420
 6: 0.5457945 0.45173251
 7: 0.6796699 0.03783026
 8: 0.4442776 0.03121024
 9: 0.5485731 0.32366420
10: 0.5457945 0.45173251
11: 0.6796699 0.03783026
12: 0.4442776 0.03121024
> rbindlist(dt[, get("c")])
            d          e
 1: 0.5485731 0.32366420
 2: 0.5457945 0.45173251
 3: 0.6796699 0.03783026
 4: 0.4442776 0.03121024
 5: 0.5485731 0.32366420
 6: 0.5457945 0.45173251
 7: 0.6796699 0.03783026
 8: 0.4442776 0.03121024
 9: 0.5485731 0.32366420
10: 0.5457945 0.45173251
11: 0.6796699 0.03783026
12: 0.4442776 0.03121024
> dt
     a         b                 c
1: abc 0.2416624 <data.table[4x2]>
2: def 0.0222938 <data.table[4x2]>
3: ghi 0.3510681 <data.table[4x2]>
jangorecki
  • 16,384
  • 4
  • 79
  • 160