1

I have the following data set and I would like to identify the product with the highest amount per customer_ID and convert it into a new column. I also want to keep only one record per ID.

Data to generate the data set:

x <- data.frame(customer_id=c(1,1,1,2,2,2), product=c("a","b","c","a","b","c"), amount=c(50,125,100,75,110,150))

Actual data set looks like this:

customer_id product amount 1 a 50 1 b 125 1 c 100 2 a 75 2 b 110 2 c 150

Desired output wanted should look like this:

customer_ID product_b product_c 1 125 0 2 0 150

stuski
  • 199
  • 1
  • 11

2 Answers2

2

We can do this with tidyverse. After grouping by 'customer_id', slice the row that has the maximum 'amount', paste with prefix ('product_') to 'product' column (if needed) and spread to wide format

library(dplyr)
library(tidyr)
x %>%
   group_by(customer_id) %>% 
   slice(which.max(amount)) %>% 
   mutate(product = paste0("product_", product)) %>%
   spread(product, amount, fill = 0)
#  customer_id product_b product_c
#*       <dbl>     <dbl>     <dbl>
#1           1       125         0
#2           2         0       150

Another option is to arrange the dataset by 'customer_id' and 'amount' in descending order, get the distinct rows based on 'customer_id' and `spread to 'wide'

arrange(x, customer_id, desc(amount)) %>%
        distinct(customer_id, .keep_all = TRUE) %>% 
        spread(customer_id, amount, fill = 0)
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Using reshape2 package,

library(reshape2)

x1 <- x[!!with(x, ave(amount, customer_id, FUN = function(i) i == max(i))),]

dcast(x1, customer_id ~ product, value.var = 'amount', fill = 0)
#  customer_id   b   c
#1           1 125   0
#2           2   0 150
Sotos
  • 51,121
  • 6
  • 32
  • 66