My data are similar to the following dummy data:
> dummy <- structure(list(id = c(1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10,
10, 10), dob = structure(c(1L, 1L, 6L, 6L, 6L, 3L, 9L, 2L, 5L,
7L, 4L, 8L, 6L, 6L, 6L), .Label = c("1990-01-01", "1991-11-12",
"1998-12-12", "1999-09-09", "2000-07-28", "2001-04-05", "2002-02-02",
"2004-12-16", "2012-05-06"), class = "factor"), date = structure(c(4L,
4L, 11L, 11L, 12L, 1L, 2L, 10L, 8L, 9L, 7L, 5L, 3L, 3L, 6L), .Label = c("2000-01-01",
"2000-01-03", "2002-12-15", "2003-01-06", "2003-04-05", "2003-12-15",
"2009-07-28", "2009-09-09", "2011-11-11", "2012-01-03", "2012-12-19",
"2012-12-31"), class = "factor"), text = structure(c(6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L), .Label = c("2aabb",
"2ccdd", "2eeff", "2gghh", "2iijj", "aa bb cc", "dd ee ff", "ghi",
"jklm", "nop", "qq rr", "sss ttt", "uv", "www xxx", "yy zz"), class = "factor"),
gender = structure(c(2L, 2L, 1L, 1L, 1L, 2L, 1L, 3L, 3L,
2L, 2L, 1L, 2L, 3L, 3L), .Label = c("f", "m", "mnx"), class = "factor")), .Names = c("id",
"dob", "date", "text", "gender"), row.names = c(NA, -15L), class = "data.frame")
> dummy
id dob date text gender
1 1 1990-01-01 2003-01-06 aa bb cc m
2 1 1990-01-01 2003-01-06 dd ee ff m
3 2 2001-04-05 2012-12-19 ghi f
4 2 2001-04-05 2012-12-19 jklm f
5 2 2001-04-05 2012-12-31 nop f
6 3 1998-12-12 2000-01-01 qq rr m
7 4 2012-05-06 2000-01-03 sss ttt f
8 5 1991-11-12 2012-01-03 uv mnx
9 6 2000-07-28 2009-09-09 www xxx mnx
10 7 2002-02-02 2011-11-11 yy zz m
11 8 1999-09-09 2009-07-28 2aabb m
12 9 2004-12-16 2003-04-05 2ccdd f
13 10 2001-04-05 2002-12-15 2eeff m
14 10 2001-04-05 2002-12-15 2gghh mnx
15 10 2001-04-05 2003-12-15 2iijj mnx
My goal is to end up with a data frame that retains all of the columns, but where there are multiple rows that have the same date within an id, I need the strings in 'text' for those matching dates to be concatenated with a space between, such that each date within each id appears only once. Here is what my goal data would look like:
dummy2 <- structure(list(id = c(1, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10),
dob = structure(c(1L, 6L, 6L, 3L, 9L, 2L, 5L, 7L, 4L, 8L,
6L, 6L), .Label = c("1990-01-01", "1991-11-12", "1998-12-12",
"1999-09-09", "2000-07-28", "2001-04-05", "2002-02-02", "2004-12-16",
"2012-05-06"), class = "factor"), date = structure(c(4L,
11L, 12L, 1L, 2L, 10L, 8L, 9L, 7L, 5L, 3L, 6L), .Label = c("2000-01-01",
"2000-01-03", "2002-12-15", "2003-01-06", "2003-04-05", "2003-12-15",
"2009-07-28", "2009-09-09", "2011-11-11", "2012-01-03", "2012-12-19",
"2012-12-31"), class = "factor"), text = structure(c(5L,
6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L), .Label = c("2aabb",
"2ccdd", "2eeff 2gghh", "2iijj", "aa bb cc dd ee ff", "ghi jklm",
"nop", "qq rr", "sss ttt", "uv", "www xxx", "yy zz"), class = "factor"),
gender = structure(c(2L, 1L, 1L, 2L, 1L, 3L, 3L, 2L, 2L,
1L, 3L, 3L), .Label = c("f", "m", "mnx"), class = "factor")), .Names = c("id",
"dob", "date", "text", "gender"), row.names = c(NA, -12L), class = "data.frame")
> dummy2
id dob date text gender
1 1 1990-01-01 2003-01-06 aa bb cc dd ee ff m
2 2 2001-04-05 2012-12-19 ghi jklm f
3 2 2001-04-05 2012-12-31 nop f
4 3 1998-12-12 2000-01-01 qq rr m
5 4 2012-05-06 2000-01-03 sss ttt f
6 5 1991-11-12 2012-01-03 uv mnx
7 6 2000-07-28 2009-09-09 www xxx mnx
8 7 2002-02-02 2011-11-11 yy zz m
9 8 1999-09-09 2009-07-28 2aabb m
10 9 2004-12-16 2003-04-05 2ccdd f
11 10 2001-04-05 2002-12-15 2eeff 2gghh mnx
12 10 2001-04-05 2003-12-15 2iijj mnx
I have tried:
dummy$text <- as.character(dummy$text)
test1 <- ddply(dummy, .(id, date), summarise,
paste0(unique(unlist(strsplit(text, split=", "))), collapse=", "))
for (i in 1:length(dummy$id)){
ifelse(dummy$id[i]==dummy$id[i-1],
(ifelse(dummy$date[i]==dummy$date[i-1],textcon[i]<- paste(dummy$text[i],dummy$text[i-1]),textcon[i]<-dummy$text[i])),
(textcon[i]<-dummy$text[i]))
}
test3<-data.frame(dummy,textcon)
And many other variants, but I'm just not sure how to come up with the data where any date within an id is not duplicated! This is similar to a couple of previous questions on SO except that my problem revolves around having to use two grouping factors concurrently, not one.
Thanks in advance for any help.