I am looking to summarize a customer transactional dataframe to a single row per customer using dplyr. For continuous variables this is simple - use sum / mean etc. For categorical variables I would like to choose the "Mode" - i.e. the most commonly encountered value within the group and do this across multiple columns e.g.:
For example to take the table Cus1
Cus <- data.frame(Customer = c("C-01", "C-01", "C-02", "C-02", "C-02", "C-02", "C-03", "C-03"),
Product = c("COKE", "COKE", "FRIES", "SHAKE", "BURGER", "BURGER", "CHICKEN", "FISH"),
Store = c("NYC", "NYC", "Chicago", "Chicago", "Detroit", "Detroit", "LA", "San Fran")
)
And generate the table Cus_Summary:
Cus_Summary <- data.frame(Customer = c("C-01", "C-02", "C-03"),
Product = c("COKE", "BURGER", "CHICKEN"),
Store = c("NYC", "Chicago", "LA")
)
Are there any packages that can provide this function? Or has anyone a function that can be applied across multiple columns within a dplyr step?
I am not worried about smart ways to handle ties - any output for a tie will suffice (although any suggestions as to how to best handle ties would be interesting and appreciated).