I am loading some JSON data using jsonlite which is resulting in some nested data similar (in structure) to the toy data.table dt constructed below.
Setup:
dt <- data.table(a=c("abc", "def", "ghi"), b=runif(3))
dt[, c:=list(list(data.table(d=runif(4), e=runif(4))))]
dt
a b c
1: abc 0.2623218 <data.table>
2: def 0.7092507 <data.table>
3: ghi 0.2795103 <data.table>
In my data there exists key/identifier columns (e.g., column a
in the above data.table
) but the nested data has no such identifier. I need to work with the nested data (see my related question) but before using rbindlist
I want to push the value of column a
as a constant column of the nested data.
For example, the value of c
in the first row is:
dt[1, c]
[[1]]
d e
1: 0.27951027 0.6350479
2: 0.45996594 0.6148956
3: 0.29377315 0.8432641
4: 0.01850265 0.1346082
My objective is to modify it such that it look like the following before binding the nested tables together so that I know which rows go with which record:
dt[1, c]
[[1]]
d e a
1: 0.27951027 0.6350479 abc
2: 0.45996594 0.6148956 abc
3: 0.29377315 0.8432641 abc
4: 0.01850265 0.1346082 abc
I can make this happen using a custom function that gets called within the J-expression and takes the nested data.table
for that row as well as the constant column (a
in this example) name and value. Something along the lines of:
add_nested_column <- function(dfl, name, value) {
df <- dfl[[1]] # data.frame wrapped in list per row, pull it out
df[, c(name):=value]
return(list(list(df)))
}
used like so:
dt[, key:=1:.N]
dt[, c:=add_nested_column(c, "a", a), by=key]
rbindlist(dt[, c])
d e a
1: 0.27951027 0.6350479 abc
2: 0.45996594 0.6148956 abc
3: 0.29377315 0.8432641 abc
4: 0.01850265 0.1346082 abc
5: 0.27951027 0.6350479 def
6: 0.45996594 0.6148956 def
7: 0.29377315 0.8432641 def
8: 0.01850265 0.1346082 def
9: 0.27951027 0.6350479 ghi
10: 0.45996594 0.6148956 ghi
11: 0.29377315 0.8432641 ghi
12: 0.01850265 0.1346082 ghi
This solution doesn't seem all that bad to me but I'm wondering if it is really the best practice or if there are features built into data.table
or other techniques that I am unaware of that would make it more efficient/concise.