Aggregating over continuous variables

Question

Let's say I have the following data frame in R:

d <- read.table(text='Name     Month  Rate1     Rate2
Aira       1      12        23
Aira       2      18        73
Aira       3      19        45
Ben        1      53        19
Ben        2      22        87
Ben        3      19        45
Cat        1      22        87
Cat        2      67        43
Cat        3      45        32', header=TRUE)

I want to calculate the average of Rate2 for each quartile of Rate1. For example, what is the average of Rate2 for observations from the 0th to 25th percentile of Rate1, the 26th to 50th percentile of Rate1, and so on.

How can I do this?

Seems like your problem is extremely similar to this question: [Using CUT and Quartile to generate breaks in R function](https://stackoverflow.com/questions/11728419/using-cut-and-quartile-to-generate-breaks-in-r-function) — Matt Summersgill, Feb 15 '18 at 15:36

Sotos · Answer 1 · 2018-02-15T15:45:27.323

2

You can use cut to find the quartiles and aggregate in order to summarise or use ave in order to create a column, depending on how you want the output (Thanks to @akrun for the comment)

aggregate(Rate2 ~ cut(Rate1, breaks = quantile(Rate1), include.lowest = TRUE), d, mean)

#or

with(d, ave(Rate2, cut(Rate1, breaks = quantile(Rate1), include.lowest = TRUE), FUN = mean))

edited Feb 15 '18 at 15:45

answered Feb 15 '18 at 15:39

Sotos

51,121
6
32
66

score 1 · Answer 2 · answered Feb 15 '18 at 15:38

How about finding which variables are in the quantiles you desiere and then computing the mean for those, like this:

q_use = list(c(0,0.25),c(0.26,0.5),c(0.51,0.75),c(0.76,1))
pos_use = lapply(1:length(q_use), function(x) quantile(d$Rate1,q_use[[x]]))
average_q = lapply(1:length(q_use), function(x) 
  mean(d[which(d$Rate1>=pos_use[[x]][1] & d$Rate1<=pos_use[[x]][2]),"Rate2"]))

Aggregating over continuous variables

2 Answers2