1

My question is an extension of this one : Create a column with a count of occurrences in R

Given their example, I would like to see a count of how many of each products each person has.

I have a very large dataset which has this structure:

df <- c(Contact.ID, Date.Time, Week, Attendance, WeeklyAT)

 Contact.ID       Date    Time    Week   Attendance  *WeeklyAT 
1   A       2012-10-06 18:54:48   44         30          *2
2   A       2012-10-08 20:50:18   44         30          *2
3   A       2013-05-24 20:18:44   21         30          *1
4   B       2012-11-15 16:58:15   46         40          *1 
5   B       2013-01-09 10:57:02    2         40          *3
6   B       2013-01-11 17:31:22    2         40          *3
7   B       2013-01-14 18:37:00    2         40          *3
8   C       2013-02-22 17:46:07    8         5           *1
9   C       2013-02-27 11:21:00    9         5           *1
10  D       2012-10-28 14:48:33   43         12          *1

I am trying to find the WeeklyAT column, counting how often a week number occurs PER ID (which is the difference to the other question as I am not looking at general occurrence or count of weeks per ID)

I have tried these two approaches and mixtures of them but can't seem to find the right one:

df <- transform(df, WeeklyAT = ave(seq(nrow(df)), Gym$Week, FUN=length))

df <- within(df, { WeeklyAT = ave(df$Week, df$Contact.ID, FUN = function(x) length(unique(x))) })

I greatly appreciate any help or tip :) Many Thanks.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Fee
  • 89
  • 1
  • 9

1 Answers1

2

Using data.table you could do something like the following:

library(data.table)
setDT(df)
merge(df, df[, WeeklyAT := .N, by = .(Contact.ID, Week)])

       Contact.ID       Date     Time Week Attendance X.WeeklyAT WeeklyAT
 1:          A 2012-10-06 18:54:48   44         30         *2        2
 2:          A 2012-10-08 20:50:18   44         30         *2        2
 3:          A 2013-05-24 20:18:44   21         30         *1        1
 4:          B 2012-11-15 16:58:15   46         40         *1        1
 5:          B 2013-01-09 10:57:02    2         40         *3        3
 6:          B 2013-01-11 17:31:22    2         40         *3        3
 7:          B 2013-01-14 18:37:00    2         40         *3        3
 8:          C 2013-02-22 17:46:07    8          5         *1        1
 9:          C 2013-02-27 11:21:00    9          5         *1        1
10:          D 2012-10-28 14:48:33   43         12         *1        1

EDIT:

Apparently dplyrcan do something very similar:

library(dplyr)
merge(df, 
      df %>% group_by(Contact.ID, Week) %>% summarise(WeeklyAT = n()))
s_baldur
  • 29,441
  • 4
  • 36
  • 69