3

I have a df/zoo/xts/whatever that is split by day of week. I'd like to further split this by week for each entry.

An example would be for Friday, there is a list of id's with an associated time for each id. These times could be for any Friday in a year's span. I'd like to make a new df that has each id along with the counts for each week (in sequential order) on that Friday.

It would look something like the following where each w column is a different Friday's count:

    id w1 w2 w3 w4
1 id_1  1  2  2  8
2 id_2  3  1  5  2
3 id_3  7  4 10  7

dput:

structure(list(id = c("id_1", "id_2", "id_3"), w1 = c(1, 3, 7
), w2 = c(2, 1, 4), w3 = c(2L, 5L, 10L), w4 = c(8L, 2L, 7L)), .Names = c("id", 
"w1", "w2", "w3", "w4"), row.names = c(NA, 3L), class = "data.frame")

This seems like it's ripe for aggregate, but I cant quite get the syntax right. Other things I've tried are below:

# Applies sum to everything, which doesnt make sense in this context
apply.weekly(friday, sum)

# I considered doing something like getting the unique weeks with:
as.numeric(unique(format(friday[,2], "%U")))
# and then generating each week, getting the counts for each user, and then making a new df from this process. But this seems very inefficient.  

Edit: output from str(data[1:20,]):

'data.frame':   20 obs. of  2 variables:
 $ id  : num  1 2 3 4 5 1 2 3 3 2 ...
 $ time: POSIXct, format: "2011-04-25 14:00:00" "2011-04-28 20:00:00" "2011-05-03 06:00:00" "2011-05-06 14:00:00" ...

output from dput(data[1:20,]):

structure(list(id = c(1, 2, 3, 4, 5, 1, 2, 3, 3, 2, 1, 4, 3, 
2, 1, 4, 3, 2, 1, 7), time = structure(c(1303754400, 1304035200, 
1304416800, 1304704800, 1304920800, 1305252000, 1305428400, 1305522000, 
1305774000, 1306404000, 1306422000, 1308261600, 1308290400, 1308340800, 
1308542400, 1308715200, 1308722400, 1308844800, 1309575600, 1309730400
), class = c("POSIXct", "POSIXt"))), .Names = c("id", "time"), row.names = c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L), class = "data.frame")
user592419
  • 5,103
  • 9
  • 42
  • 67
  • 2
    Please post what you have so far – Michael Durrant Nov 12 '11 at 23:08
  • you'll get much better answers if you update your question after reviewing [this question](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and revising what you've written thus far. – Chase Nov 12 '11 at 23:11
  • 1
    Edited to reflect your comment. – user592419 Nov 12 '11 at 23:22
  • 2
    Perhaps others can follow this, but I can't. What would actually be helpful is if you posted the output from `str(data)` and made a reproducible example using some fake data. – joran Nov 13 '11 at 00:46
  • Yeah, dput us pls. dput(datas[1:20,]) or something like that – Rguy Nov 13 '11 at 00:53
  • Thanks for providing the dput. Somebody else might understand what you want here, but I am still unclear. Could you make an artificial sample data frame of what you want your result to be? – Rguy Nov 13 '11 at 02:41

1 Answers1

4

If I am understanding what you want, you need to make additional columns for the day of the week (so you can identify that) and for the week of the year (so that you can end up with separate columns for each). Using the data that you gave a dput() for:

data$day.of.week <- format(data$time, "%A")
data$week.of.year <- format(data$time, "%U")

Effectively now you want to reshape the data, so using the reshape2 package (not the only way, but the one I am most familiar with)

library("reshape2")

dcast(data[data$day.of.week=="Friday",], id~week.of.year, 
    value_var="time", fun.aggregate=length)

In that example, I subsetted the data to just get the Fridays. If you wanted to do all the days, but each day separately, the plyr package can help with that iterating.

library("plyr")

dlply(data, .(day.of.week), dcast, id~week.of.year, 
    value_var="time", fun.aggregate=length)

The results of these two are:

> dcast(data[data$day.of.week=="Friday",], id~week.of.year, value_var="time", fun.aggregate=length)
  id 18 24 26
1  1  0  0  1
2  2  0  1  0
3  4  1  0  0

> dlply(data, .(day.of.week), dcast, id~week.of.year, value_var="time", fun.aggregate=length)
$Friday
  id 18 24 26
1  1  0  0  1
2  2  0  1  0
3  4  1  0  0

$Monday
  id 17
1  1  1

$Saturday
  id 19
1  2  1

$Sunday
  id 19 20 25 27
1  1  0  0  1  0
2  3  0  1  0  0
3  5  1  0  0  0
4  7  0  0  0  1

$Thursday
  id 17 19 21 24 25
1  1  0  1  1  0  0
2  2  1  0  1  0  1
3  3  0  0  0  1  0
4  4  0  0  0  1  0

$Tuesday
  id 18 25
1  3  1  1
2  4  0  1

$Wednesday
  id 20
1  3  1

attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
  day.of.week
1      Friday
2      Monday
3    Saturday
4      Sunday
5    Thursday
6     Tuesday
7   Wednesday
Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
  • Yes, this is pretty much what I am looking for. I tried duplicating it and found that, at the size of the data I am doing it on, the dcast/dlply commands take prohibitively long. Might just take it off R to do this part. – user592419 Nov 13 '11 at 07:01
  • I think (but am not sure) that the latest reshape2 and ddply packages support parrelisation. If you have a multiple core machine (or access to multiple machines) then that might speed things up for you. – PaulHurleyuk Nov 13 '11 at 14:14
  • I tried this; it still takes a while. But the speedup is noticeable for some sets. Thanks for the tip. – user592419 Nov 13 '11 at 18:59