I have a data.table
where more than 2 columns are of the type list
. I would like to expand these columns, so that each element of the list becomes a new column. I would like to have a more elegant way than to "manually" expand each column and then join the tables together.
the setup
Edit: (providing the json
from which i obtained the data.table
)
So i have a json
file like this:
[
{
"origins": [
{
"orig_lon": "14.36784",
"orig_lat": "49.985982",
"local_id": "AD.22045279",
"full_address": "Věštínská 36/9, Radotín, 15300 Praha 5"
},
{
"orig_lon": "14.352792",
"orig_lat": "49.983317",
"local_id": "AD.22055428",
"full_address": "Otínská 1102/37, Radotín, 15300 Praha 5"
}
],
"destinations": [
{
"dest_lon": "14.352245",
"dest_lat": "49.981314",
"local_id": "AD.22045848",
"full_address": "Zderazská 98/3, Radotín, 15300 Praha 5"
},
{
"dest_lon": "14.226975",
"dest_lat": "50.051702",
"local_id": "AD.27261433",
"full_address": "Západní 458, 25303 Chýně"
}
],
"destination_addresses": [
"Zderazská 98/3, 153 00 Praha-Radotín, Czechia",
"Západní 458, 253 01 Chýně, Czechia"
],
"origin_addresses": [
"U Jankovky 455/18, 153 00 Praha-Radotín, Czechia",
"Otínská 1102/37, 153 00 Praha-Radotín, Czechia"
],
"rows": [
{
"elements": [
{
"distance": {
"text": "1.6 km",
"value": 1620
},
"duration": {
"text": "5 mins",
"value": 272
},
"duration_in_traffic": {
"text": "5 mins",
"value": 277
},
"status": "OK"
},
{
"distance": {
"text": "19.3 km",
"value": 19313
},
"duration": {
"text": "22 mins",
"value": 1343
},
"duration_in_traffic": {
"text": "24 mins",
"value": 1424
},
"status": "OK"
}
]
},
{
"elements": [
{
"distance": {
"text": "0.7 km",
"value": 691
},
"duration": {
"text": "2 mins",
"value": 101
},
"duration_in_traffic": {
"text": "2 mins",
"value": 99
},
"status": "OK"
},
{
"distance": {
"text": "18.7 km",
"value": 18655
},
"duration": {
"text": "21 mins",
"value": 1246
},
"duration_in_traffic": {
"text": "22 mins",
"value": 1336
},
"status": "OK"
}
]
}
],
"status": "OK"
},
{
"origins": [
{
"orig_lon": "14.36784",
"orig_lat": "49.985982",
"local_id": "AD.22045279",
"full_address": "Věštínská 36/9, Radotín, 15300 Praha 5"
},
{
"orig_lon": "14.352792",
"orig_lat": "49.983317",
"local_id": "AD.22055428",
"full_address": "Otínská 1102/37, Radotín, 15300 Praha 5"
}
],
"destinations": [
{
"dest_lon": "14.36053",
"dest_lat": "49.981687",
"local_id": "AD.22047131",
"full_address": "Zítkova 235/7, Radotín, 15300 Praha 5"
},
{
"dest_lon": "14.361052",
"dest_lat": "49.988529",
"local_id": "AD.22054952",
"full_address": "Strážovská 1053/33, Radotín, 15300 Praha 5"
}
],
"destination_addresses": [
"Zítkova 235/7, 153 00 Praha-Radotín, Czechia",
"Strážovská 1053/33, 153 00 Praha-Radotín, Czechia"
],
"origin_addresses": [
"U Jankovky 455/18, 153 00 Praha-Radotín, Czechia",
"Otínská 1102/37, 153 00 Praha-Radotín, Czechia"
],
"rows": [
{
"elements": [
{
"distance": {
"text": "1.4 km",
"value": 1445
},
"duration": {
"text": "4 mins",
"value": 248
},
"duration_in_traffic": {
"text": "4 mins",
"value": 247
},
"status": "OK"
},
{
"distance": {
"text": "1.9 km",
"value": 1933
},
"duration": {
"text": "4 mins",
"value": 264
},
"duration_in_traffic": {
"text": "4 mins",
"value": 267
},
"status": "OK"
}
]
},
{
"elements": [
{
"distance": {
"text": "1.4 km",
"value": 1374
},
"duration": {
"text": "4 mins",
"value": 232
},
"duration_in_traffic": {
"text": "4 mins",
"value": 241
},
"status": "OK"
},
{
"distance": {
"text": "1.3 km",
"value": 1274
},
"duration": {
"text": "3 mins",
"value": 167
},
"duration_in_traffic": {
"text": "3 mins",
"value": 174
},
"status": "OK"
}
]
}
],
"status": "OK"
}
]
Which I read in like:
library(jsonlite)
library(data.table)
data <- read_json('./path_to_that_json/that_json.json')
This results in a list
of length 2.
I can covert this into data.table
like:
dt <- rbindlist(lapply(data, as.data.table))
Which then results in a data.table
like:
origins destinations destination_addresses origin_addresses
1: <list> <list> Zderazská 98/3, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia
2: <list> <list> Západní 458, 253 01 Chýne, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia
3: <list> <list> Zítkova 235/7, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia
4: <list> <list> Strážovská 1053/33, 153 00 Praha-Radotín, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia
rows status
1: <list> OK
2: <list> OK
3: <list> OK
4: <list> OK
This means I have several columns containing list and i would like to expand them.
what kinda works
I know that to expand just one column, I can do:
dt[, r = as.character(.I)]
res1 <- dt[, rbindlist(setNames(origins, r), id = "r")]
(I found that here: Expand list column of data.tables )
Now, i could expand multiple columns by repeating this call and joining the results using the column r
. This could look like:
res1 <- dt[dt[, rbindlist(origins, id = "r")][
, `:=`(r=as.character(r))], on = "r"][, `:=`(origins = NULL, destinations = NULL)][dt[
, rbindlist(destinations, id = "r")][
, `:=`(r=as.character(r))], on = "r"]
Which would give me the desired output of:
destination_addresses origin_addresses rows status r
1: Zderazská 98/3, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia <list> OK 1
2: Západní 458, 253 01 Chýne, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia <list> OK 2
3: Zítkova 235/7, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia <list> OK 3
4: Strážovská 1053/33, 153 00 Praha-Radotín, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia <list> OK 4
orig_lon orig_lat local_id full_address dest_lon dest_lat i.local_id
1: 14.36784 49.985982 AD.22045279 Veštínská 36/9, Radotín, 15300 Praha 5 14.352245 49.981314 AD.22045848
2: 14.352792 49.983317 AD.22055428 Otínská 1102/37, Radotín, 15300 Praha 5 14.226975 50.051702 AD.27261433
3: 14.36784 49.985982 AD.22045279 Veštínská 36/9, Radotín, 15300 Praha 5 14.36053 49.981687 AD.22047131
4: 14.352792 49.983317 AD.22055428 Otínská 1102/37, Radotín, 15300 Praha 5 14.361052 49.988529 AD.22054952
i.full_address
1: Zderazská 98/3, Radotín, 15300 Praha 5
2: Západní 458, 25303 Chýne
3: Zítkova 235/7, Radotín, 15300 Praha 5
4: Strážovská 1053/33, Radotín, 15300 Praha 5
My question is:
Is there a more elegant and more efficient way of expanding several columns? In theory, i would like to have a list of columns to be expanded and then make one call which would expand all of them and return the above result.
Also, with the column rows
, the expanding is a bit more complicated: so far i am creating a new column of type list
, which does not include the status
record. Something like:
dt[, rows2 := lapply(rows, function(x) list("distance" = (x[[1]][[1]]["distance"]),
"duration" = (x[[1]][[1]]["duration"]),
"duration_in_traffic" = (x[[1]][[1]]["duration_in_traffic"])))]
And then the above procedure can be used to expand rows2
into three columns of type list
, which can be subsequently expanded using the same procedure. Now, this approach sucks for the obvious reason as not being really straightforward for anyone who reads the code after me. Moreover, it takes a lot of typing. I think there must be way more elegant way of wrangling this.