Group rows in data.frame and find quantile

Question

I have the following data:

set.seed(789)
df_1 = data.frame(a = 22, b = 24, c = rnorm(10))
df_2 = data.frame(a = 44, b = 24, c = rnorm(10))
df_3 = data.frame(a = 33, b = 99, c = rnorm(10))

df_all = rbind(df_1, df_2, df_3)

I need to group df_all by column a and b, and then find the 50th quantile based on column c.

This can be done singularly, for each df, as follows:

df_1_q = quantile(df_1$c, probs = 0.50)
df_2_q = quantile(df_2$c, probs = 0.50)
df_3_q = quantile(df_3$c, probs = 0.50)

However my real df_all is larger than this.

And more generally, how can I group a data.frame by rows and apply a given function?

thanks

`tapply()` can do that. https://stackoverflow.com/questions/3505701/grouping-functions-tapply-by-aggregate-and-the-apply-family https://stackoverflow.com/questions/39462610/groupwise-computation-in-r — jogo, Apr 30 '19 at 06:51

score 1 · Accepted Answer · answered Apr 30 '19 at 06:38

1

You could use dplyr for that

library(dplyr)
df_all %>%
  group_by(a, b) %>%
  summarise(quantile = quantile(c, probs = 0.5))
# A tibble: 3 x 3
# Groups:   a [?]
      a     b quantile
  <dbl> <dbl>    <dbl>
1    22    24   -0.268
2    33    99   -0.234
3    44    24   -0.445

Or using data.table as:

library(data.table)
dt <- data.table(df_all)
dt[,list(quantile=quantile(c, probs = 0.5)),by=c("a", "b")]
    a  b       quantile
1: 22 24 -0.2679104
2: 44 24 -0.4450979
3: 33 99 -0.2336712

answered Apr 30 '19 at 06:38

Sonny

3,083
1
11
19

50th quantile is the median, so `median` would probably be a lot faster – Rohit Apr 30 '19 at 06:43
yes, but the user can have any percentile that he may want to use. – Sonny Apr 30 '19 at 06:46

Group rows in data.frame and find quantile

1 Answers1