0

I'd like to be able to "unpack" a vector returned by a function to multiple columns using data.table in R.

In Python it is fairly straightforward to unpack a tuple to multiple columns of a Pandas dataframe:

import pandas as pd

df = pd.DataFrame({'A':[(1,2,3), (3,4,5)]})  

print(df)
#            A
# 0  (1, 2, 3)
# 1  (3, 4, 5)

df[['A1','A2', 'A3']] = pd.DataFrame(df['A'].tolist(), index=df.index)

print(df[['A1','A2', 'A3']])
#    A1  A2  A3
# 0   1   2   3
# 1   3   4   5

I'd like to be able to do something similar in R with data.table.

set.seed(123)
DT <- data.table(group = c(rep('a', 10), rep('b', 10)),
                 value = c(sample(1:50, 10), sample(50:100, 10)))

# this works with two function calls to extract indexed vector elements:
DT[, .(x50 = quantile(value, probs = c(0.5, 0.8))[1],
       x80 = quantile(value, probs = c(0.5, 0.8))[2]),
   by = group]
#    group  x50  x80
# 1:     a 28.5 42.2
# 2:     b 75.5 79.2

# I'd like to be able to do this in a single function call:
DT[, .(.(x50, x80) = quantile(value, probs = c(0.5, 0.8))), by = group]
# Error: unexpected '=' in "DT[, .(.(x50, x80) ="

I frequently calculate multiple quantile values for very large datasets (10m+ rows). Calling quantile multiple times is inefficient, but I've been unable to find a pithy way to unpack the vector result returned by quantile() to multiple columns.

Mark Egge
  • 638
  • 8
  • 9

3 Answers3

3

Another option is:

DT[, as.list(quantile(value, probs = c(0.5, 0.8))), group]
   group  50%  80%
1:     a 28.5 42.2
2:     b 75.5 79.2
Uwe
  • 41,420
  • 11
  • 90
  • 134
chinsoon12
  • 25,005
  • 4
  • 25
  • 35
  • Thanks, Ronak! I was thinking of `rbindlist` when i was doing typing and didnt remove after replacing `.` with `as.list` – chinsoon12 Jul 08 '21 at 04:18
0

Try

library(data.table)
 DT[, as.list(as.data.frame.list(quantile(value, probs = c(0.5, 0.8)))), group]
   group X50. X80.
1:     a 28.5 42.2
2:     b 75.5 79.2

Or wrap with setDT

DT[, setDT(as.list(quantile(value, probs = c(0.5, 0.8)))), group]
   group  50%  80%
1:     a 28.5 42.2
2:     b 75.5 79.2

or using unnest

library(tidyr)
setDT(DT[,  .(list(as.data.frame.list(quantile(value, 
       probs = c(0.5, 0.8))))), by = group][, unnest(.SD, V1)])[]
   group X50. X80.
1:     a 28.5 42.2
2:     b 75.5 79.2

or in tidyverse

library(dplyr)
library(tidyr)
DT %>%
     group_by(group) %>%
     summarise(out = list(as.list(quantile(value, probs = c(0.5, 0.8))))) %>%
     unnest_wider(c(out))
# A tibble: 2 x 3
  group `50%` `80%`
  <chr> <dbl> <dbl>
1 a      28.5  42.2
2 b      75.5  79.2
akrun
  • 874,273
  • 37
  • 540
  • 662
0

you also using 'dplyr'

 library(dplyr)
 DT %>% group_by(group) %>% summarize(X50=quantile(value,probs=c(0.5,0.8))[1],
                                      X80=quantile(value,probs=c(0.5,0.8))[2])
foreach
  • 89
  • 2