2

I have a dataset with three column

  Date1      StudentId  Status
  08/04/2014 155261     Yes
  08/04/2014 155261     No
  08/25/2014 236991     Yes
  08/27/2014 236991     Yes
  08/29/2014 236991     Yes

I am trying to aggregate the Status by Id and Date1, such that the final dataset will be like this

  Date1      StudentId  Response
  08/04/2014 155261     Yes, No
  08/25/2014 236991     Yes
  08/27/2014 236991     Yes
  08/29/2014 236991     Yes

I tried using the gsub function but it didnt work, it only aggregates based on StudentId and skips the Date, any help on this issue is much appreciated.

 dataset1[,Response:=gsub("(, )+$","",c(paste(Status,collapse=", "),rep("",.N-1))),by=c("StudentId ","Date1")]
Ezra Polson
  • 235
  • 3
  • 13
  • Are you using a `data.table` (required for use of `:=`)? What does it mean that it didn't work here? – Frank May 21 '15 at 20:16
  • @Frank, it was only aggregating based on StudentId and unable to combine both StudentId and Date, bgoldst solution fixed the problem – Ezra Polson May 21 '15 at 20:30
  • 1
    @Ezra if you already have a `data.table` object, using base R aggregating operations is like riding a bike when you own a Jaguar (unless you are doing it for the sports) – David Arenburg May 21 '15 at 20:39

2 Answers2

3
df <- data.frame(Date1=c('08/04/2014','08/04/2014','08/25/2014','08/27/2014','08/29/2014'), StudentId=c(155261,155261,236991,236991,236991), Status=c('Yes','No','Yes','Yes','Yes') );
aggregate(Status~Date1+StudentId,df,paste,collapse=', ');
##        Date1 StudentId  Status
## 1 08/04/2014    155261 Yes, No
## 2 08/25/2014    236991     Yes
## 3 08/27/2014    236991     Yes
## 4 08/29/2014    236991     Yes

You can rename the column from Status to Response separately:

names(df)[names(df)=='Status'] <- 'Response';
bgoldst
  • 34,190
  • 6
  • 38
  • 64
  • 2
    Just be aware that `Status` is a list and I doubt it useful for the OP in any way, see `str(aggregate(Status~.,df,paste))`. So this solution is basically wrong. An optional solution is `aggregate(Status~.,df,toString)` – David Arenburg May 21 '15 at 20:32
  • 1
    Ah, I see now, effectively `paste` in the first version is the same as `as.character`. As long as the "edit" version is what you'd recommend, you might as well put it at the top, right? – Frank May 21 '15 at 20:38
  • Why not just `aggregate(Status~.,df,toString)` though? – David Arenburg May 21 '15 at 20:38
  • 2
    Maybe because that function is too obscure for us to know :) – Frank May 21 '15 at 20:39
  • 1
    @Frank don't even start, mister `[.listof` – David Arenburg May 21 '15 at 20:40
3

No gsub is needed to concatenate strings (thanks to @DavidArenburg for a simplifcation):

DT1 <- DT[,list(Response=toString(Status)),by=list(Date1,StudentId)]

If students might show up multiple times with the same status, you'll want to use unique on Status.


Instead of using a string, you could store the values in a list with list(String).

  • The good: Set operations like %in% can be used and may be more intuitive than string parsing.
  • The bad: List columns cannot be used in by operations and are generally cumbersome.
Frank
  • 66,179
  • 8
  • 96
  • 180