-1

I have a data.table that lists the user id, the week number, the fact that a user did something (Processed, either 0 or 1) and a column I just use to count how many values I have, called HowMany:

 data <- data.table(WeekNumber=c(33,33,33,34,34,33,33,34,34), 
         User=c(1,1,1,1,1,2,2,2,2), 
         Processed=c(1,1,0,0,1,0,1,0,1),
         HowMany=c(1,1,1,1,1,1,1,1,1))

I want to find, for each week, the sum of things done and not done, so I do something like this:

> dcast(setDT(data), WeekNumber~Processed, value.var="HowMany", sum) 
   WeekNumber 0 1
1:         33 2 3
2:         34 2 2

Now I'd like to find the average number of things done and not done by week, so in this case I have to somewhat aggregate also by user before, but I fail at this step:

> dcast(setDT(data), WeekNumber~Processed+User, value.var="HowMany", mean) 
  WeekNumber 0_1 0_2 1_1 1_2
1:        33   1   1   1   1
2:        34   1   1   1   1

while my optimal results would be:

WeekNumber 0   1
        33 1 1.5
        34 1   1
user299791
  • 2,021
  • 3
  • 31
  • 57
  • 2
    Ok, your desired output just comes from `table`, like `data[, table(WeekNumber, Processed)/uniqueN(WeekNumber)]` – Frank Nov 10 '16 at 19:55
  • @Frank thanks, are you going to write this as an answer so I can accept it? – user299791 Nov 10 '16 at 20:04
  • 2
    Maybe it could be closed as a dupe of an older question instead like http://stackoverflow.com/q/25293045/ – Frank Nov 11 '16 at 05:17
  • I don't understand why it is on hold as unclear? @Frank you replied correctly to my edited question... you still think it's unclear? – user299791 Nov 11 '16 at 08:48
  • No, I don't think it's unclear now. I took back my vote, but wasn't able to re-vote to close it as a dupe (since I can't vote twice). – Frank Nov 11 '16 at 12:35
  • what I don't understand is why there are still votes to close it because unclear... are these people gonna check again? or should I do something? – user299791 Nov 11 '16 at 12:47
  • Hm, I don't know what you can do. I've voted to reopen which will put it in a queue where others can vote as well. And I mentioned it to some others in chat. – Frank Nov 11 '16 at 13:04

1 Answers1

2

What about something like this:

dat[, user_processed := paste(User, Processed, sep="_")]
dcast(dat, WeekNumber~user_processed, value.var="Processed", length) 

Which gives you:

   WeekNumber 10001041_1 10001042_0 10001042_1
1:         33          0          3          2
2:         43          5          0          0

Sample data used:

dat <- fread("User Processed WeekNumber
  1: 10001042         0         33
      2: 10001042         0         33
      3: 10001042         1         33
      4: 10001042         0         33
      5: 10001042         1         33
      870: 10001041         1         43
      871: 10001041         1         43
      872: 10001041         1         43
      873: 10001041         1         43
      874: 10001041         1         43")

dat <- dat[, V1 := NULL]
setnames(dat, c("User", "Processed", "WeekNumber"))
Rentrop
  • 20,979
  • 10
  • 72
  • 100
  • I am sorry but can you point out where do you compute the average as requested in the question? – user299791 Nov 10 '16 at 17:42
  • 1
    @user299791 Your question is vague. In the R tag, as elsewhere on SO, you're expected to post a minimal reproducible example with corresponding output. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250 – Frank Nov 10 '16 at 18:46
  • @user299791 you could use `mean` instead of `length` but i think fundamentally everything is there to produce the expected output... As Frank i am not 100% sure what you expect. – Rentrop Nov 10 '16 at 19:31
  • I have re.wrote the question, hope my point is clearer now – user299791 Nov 10 '16 at 19:52