0
CUSTOMER DATE    FEATURE 
1        202001     A       
1        202001     B        
1        202002     A
2        202001     C        
2        202002     A
2        202002     B
2        202002     C

I have a dataset like above and I want to get FEATUREs at each time point for each CUSTOMER like below:

CUSTOMER DATE    FEATURE ALL_FEATURES
1        202001     A       A,B
1        202001     B       A,B
1        202002     A       A
2        202001     C       C 
2        202002     A       A,B,C
2        202002     B       A,B,C
2        202002     C       A,B,C

I tried dcast function like dcast(df, CUSTOMER, DATE~FEATURE) to separate FEATUREs, but then the situation is too complicated to finish:there are 9 possibilities to finish it using ifelse.

How can I finish it in a simple way? Thanks.

camille
  • 16,432
  • 18
  • 38
  • 60
S.Lee
  • 47
  • 7
  • 1
    Does this answer your question? [Concatenate strings by group with dplyr](https://stackoverflow.com/questions/38514988/concatenate-strings-by-group-with-dplyr) – camille Apr 20 '20 at 19:24

2 Answers2

2

We can group over the 'CUSTOMER', 'DATE' and paste with str_c

library(dplyr)
library(stringr)
df1 %>%
   group_by(CUSTOMER, DATE) %>%
   mutate(ALL_FEATURES = str_c(FEATURE, collapse = ","))
# A tibble: 7 x 4
# Groups:   CUSTOMER, DATE [4]
#  CUSTOMER   DATE FEATURE ALL_FEATURES
#     <int>  <int> <chr>   <chr>       
#1        1 202001 A       A,B         
#2        1 202001 B       A,B         
#3        1 202002 A       A           
#4        2 202001 C       C           
#5        2 202002 A       A,B,C       
#6        2 202002 B       A,B,C       
#7        2 202002 C       A,B,C       

data

df1 <- structure(list(CUSTOMER = c(1L, 1L, 1L, 2L, 2L, 2L, 2L), DATE = c(202001L, 
202001L, 202002L, 202001L, 202002L, 202002L, 202002L), FEATURE = c("A", 
"B", "A", "C", "A", "B", "C")), class = "data.frame", row.names = c(NA, 
-7L))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

One base R option is using ave, e.g.,

df <- within(df,ALL_FEATURES <- ave(FEATURE,CUSTOMER,DATE,FUN = list))

or

df <- within(df,ALL_FEATURES <- ave(FEATURE,CUSTOMER,DATE,FUN = toString))

such that

> df
  CUSTOMER   DATE FEATURE ALL_FEATURES
1        1 202001       A         A, B
2        1 202001       B         A, B
3        1 202002       A            A
4        2 202001       C            C
5        2 202002       A      A, B, C
6        2 202002       B      A, B, C
7        2 202002       C      A, B, C

DATA

df <- structure(list(CUSTOMER = c(1L, 1L, 1L, 2L, 2L, 2L, 2L), DATE = c(202001L, 
202001L, 202002L, 202001L, 202002L, 202002L, 202002L), FEATURE = c("A", 
"B", "A", "C", "A", "B", "C")), class = "data.frame", row.names = c(NA, 
-7L))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81