How to aggregate by group for specific cutoffs?

Question

I am using dplyr::filter(StrawsReleased >= 1000) to work my code. Filter function retains only the rows where StrawsReleased is greater than or equal to 1000 within each group. But I would like to keep all ShortName values. I tried remove filter of the function but I didn't get solve it.

Could someone help me?

ordered_data <- read.table(text= "ShortName  DiasColeta  StrawsReleased  Idade
BAUL  3  0  5
BAUL  6  0  5
BAUL  9  380  5
BAUL  25  90  5
BAUL  34  900  5
BAUL  68  1500  5
BAUL  90  900  5
BAUL  107  1500  5
JOUL  3  0  4
JOUL  9  0  4
JOUL  15  0  4
JOUL  29  1000  4
JOUL  35  1000  4
JOUL  45  2000  4
JOUL  67  0  4
JOUL  89  1000  4
JOUL  109  50  4", header = TRUE)

library(dplyr)

a2 <- ordered_data %>%
  mutate(StrawsReleased=cumsum(StrawsReleased), 
         Doses=(StrawsReleased >= 1000) + (StrawsReleased >= 3000) + 
           (StrawsReleased >= 5000), .by=ShortName) %>%
  filter(StrawsReleased >= 1000) %>%
  slice_head(by=c(ShortName, Doses)) %>%
  mutate(Doses=paste('Doses', c('1000', '3000', '5000')[Doses])) %>%
  select(ShortName, Doses, DiasColeta, idade)

FYI: I am creating 3 groups "Doses" (1.000/3.000/5.000) based on DiasColeta and idade.

After that, I would like to calculate the general mean for Doses based on ShortName and Idade Values.

I am using:

a2 %>% 
  group_by(Doses, idade) %>% 
  summarise(n=n(), 
            TempoParaProd=mean(DiasColeta))

Desired output:

ShortName Doses Idade DiasColeta
BAUL Doses 1000 5  34
BAUL Doses 3000 5  90
BAUL Doses 5000 5 107
JOUL Doses 1000 4  29
JOUL Doses 3000 4  45
JOUL Doses 5000 4  89

Could you edit your question to include sample data and desired output? For help, see [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — jpsmith, Jul 18 '23 at 10:52
it's `slice_head()` that limits results to a single record per every `ShortName, Doses` combination. — margusl, Jul 18 '23 at 10:53
I would like to obtain one value of "DiasColeta" for each "ShortName" within each group (Doses). The problem is the filter, which removes some records. — BD'auria, Jul 18 '23 at 11:24

score 0 · Answer 1 · answered Jul 18 '23 at 16:37

First you want the cumsums by group, use ave. Next you want to check if certain values are exceeded, which you can do with cut. The rest is aggregate.

ordered_data2 <- within(ordered_data, {
  StrawsReleased_cs <- ave(StrawsReleased, Idade, FUN=cumsum)
  StrawsReleased <- cut(StrawsReleased_cs, c(0, 1e3, 3e3, 5e3, Inf), 
                        include.lowest=TRUE, labels=c(0, 1000, 3000, 5000)) 
})

aggregate(DiasColeta ~ StrawsReleased + Idade + ShortName,
          data=ordered_data2, subset=StrawsReleased != '0', FUN=mean)
#   StrawsReleased Idade ShortName DiasColeta
# 1           1000     5      BAUL         51
# 2           3000     5      BAUL         90
# 3           5000     5      BAUL        107
# 4           1000     4      JOUL         35
# 5           3000     4      JOUL         67
# 6           5000     4      JOUL        109

Data:

ordered_data <- structure(list(ShortName = c("BAUL", "BAUL", "BAUL", "BAUL", 
"BAUL", "BAUL", "BAUL", "BAUL", "JOUL", "JOUL", "JOUL", "JOUL", 
"JOUL", "JOUL", "JOUL", "JOUL", "JOUL"), DiasColeta = c(3L, 6L, 
9L, 25L, 34L, 68L, 90L, 107L, 3L, 9L, 15L, 29L, 35L, 45L, 67L, 
89L, 109L), StrawsReleased = c(0L, 0L, 380L, 90L, 900L, 1500L, 
900L, 1500L, 0L, 0L, 0L, 1000L, 1000L, 2000L, 0L, 1000L, 50L), 
    Idade = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 4L, 4L, 4L, 4L, 
    4L, 4L, 4L, 4L, 4L)), class = "data.frame", row.names = c(NA, 
-17L))

How to aggregate by group for specific cutoffs?

1 Answers1