0

I am trying to do Market Basket Analysis with R and run into a problem. I use .csv file, where are two columns "Product" and "Customer". Customer number is repeated as many times as he purchased different products. The table looks like that:

Product Customer    
114  1    
112  2    
112  1   
113  4    
115  3    
113  2   
111  2    
113  3

And I need to make it like this: (two columns: customer, products. For every customer all the products he bought in one cell).

Customer Products

1 114, 112    
2 112, 113, 111    
3 115, 113    
4 113

What should I do?

Any help would be great!

David Arenburg
  • 91,361
  • 17
  • 137
  • 196

2 Answers2

0

You can use aggregate (df contains your original data frame):

aggregate(Product~Customer, df, paste, sep = ", ")
#   Customer       Product
# 1        1      114, 112
# 2        2 112, 113, 111
# 3        3      115, 113
# 4        4           113

or dcast depending on what you want the output to look like:

library(reshape2)
dcast(transform(df, count = 1), Customer~Product, fill = 0)
#   Customer 111 112 113 114 115
# 1        1   0   1   0   1   0
# 2        2   1   1   1   0   0
# 3        3   0   0   1   0   1
# 4        4   0   0   1   0   0
lukeA
  • 53,097
  • 5
  • 97
  • 100
0

If your data is

Product <- c(114,112,112,113,115,113,111,113)
Customer <- c(1,2,1,4,3,2,2,3)
df <- data.frame(Product,Customer)

you can use tapply (without the >)

> tapply(df$Product,df$Customer,list)

$`1`
[1] 114 112

$`2`
[1] 112 113 111

$`3`
[1] 115 113

$`4`
[1] 113
rmuc8
  • 2,869
  • 7
  • 27
  • 36