Aggregate odd/even pairs

Question

I am trying to simplify a large dataset (52k+ rows) by finding the maximum value for every two week interval. I have already assigned week number values to every row and used the aggregate() function to find the maximum value for each week.

Simplified sample data:

week <- c(1:5, 5, 7:10)
conc <- rnorm(mean=50, sd=20, n=10)
df <- data.frame(week,conc)

aggregate(df, by=list(week), FUN=max)

However, I am stuck on how to further aggregate based on two-week intervals (ex: weeks 1&2, weeks 3&4...). It's not as simple as combining every other row since every week was sampled.

I'm assuming there's a simple solution, I just haven't found it yet.

Thanks!

Just showed this one to another SO query. Try `tapply(df$conc, rep(1:(nrow(df)/2), each=2), FUN = sum)` — Sagar, Sep 08 '17 at 18:21
What should happen at week 5? Should the week 5 value just be returned? NA? — John Paul, Sep 08 '17 at 18:21
Sagar: It still seems like it would throw off the pairings if there is a missing week. Unless I'm interpreting your code wrong... — keta, Sep 08 '17 at 18:28
John Paul: Week 5 would simply be the maximum at week 5 since there is no Week 6 — keta, Sep 08 '17 at 18:29
@keta - I ran it for the sample data set you provided. Didn't realize week 6 was missing. — Sagar, Sep 08 '17 at 18:31
@keta Why don't you create another column called `bi_week` as `(df$week+1)%/%2` and `aggregate` based on that column? — Safwan, Sep 08 '17 at 18:32
I suggest you read this post. Lots of good hacks in there. https://stackoverflow.com/questions/17389533/aggregate-values-of-15-minute-steps-to-values-of-hourly-steps — CCurtis, Sep 08 '17 at 18:39
It is hard to tell if we are all get he same answers due to the random data. If you could use `set.seed` or have set values for `conc` that would make checking a little easier. — John Paul, Sep 08 '17 at 18:46
@MusafirSafwan Clever! I wanted to do something like that but wasn't sure how to make the code work. Thanks! — keta, Sep 08 '17 at 18:52

score 1 · Accepted Answer · answered Sep 08 '17 at 18:41

1

week <- c(1:5, 5, 7:10)

bi_week <- (week+1)%/%2

conc <- rnorm(mean=50, sd=20, n=10)

df <- data.frame(week,bi_week,conc)

aggregate(df, by=list(bi_week), FUN=max)

answered Sep 08 '17 at 18:41

Safwan

3,300
1
28
33

John Paul · Answer 2 · 2017-09-08T18:49:21.417

library(purrr)
library(dplyr)

Odds<-seq(1:max(week),2)
Evens<-seq(2,max(week),2)

map2(.x=Odds,.y=Evens, .f=function(x,y) {df %>% 
filter(week==x | week==y) %>% select(conc) %>% max})

I first made vectors of odds and even numbers. Then using the purrr package I fed these pairwise (1&2, then 3&4 etc) into a function that uses the dplyr package to get just the correct weeks, select the conc values and take the max.

Here is the output:

> map2(.x=Odds,.y=Evens, .f=function(x,y) {df %>% filter(week==x | week==y) %>% select(conc) %>% max})
[[1]]
[1] 68.38759

[[2]]
[1] 56.9231

[[3]]
[1] 77.23965

[[4]]
[1] 49.39443

[[5]]
[1] 49.38465

Note: you could use map2_dbl in place of map2and get a numeric vector instead

Edit: removed the part about df2 as that was an error.

CPak · Answer 3 · 2017-09-08T18:43:20.560

0

Use pracma::ceil to grab each bi-weekly pair

library(pracma)
aggregate(df, by=list(ceil(df$week/2)), FUN=max)

Output

  Group.1 week     conc
1       1    2 76.09191
2       2    4 50.20154
3       3    5 54.93041
4       4    8 69.17820
5       5   10 74.67518

ceil(df$week/2)
# 1 1 2 2 3 3 4 4 5 5

edited Sep 08 '17 at 18:43

answered Sep 08 '17 at 18:35

CPak

13,260
3
30
48

Aggregate odd/even pairs

3 Answers3