I referred to this thread to Create table with all pairs of values from one column in R, counting unique values and Table of Interactions - Case with pets and houses understand how to create 2-way interaction table. How could I do this for all possible cases? Also, I want to find both frequency of occurrence and revenue in these bins (combinations).
Here's my input data
Customer Product Revenue
1 A Rice 10
2 A Sweet Potato 2
3 A Walnut 4
4 B Rice 3
5 B Walnut 2
6 C Walnut 3
7 C Sweet Potato 4
8 D Rice 3
9 E Sweet Potato 4
10 F Walnut 7
11 G Rice 2
12 G Sweet Potato 3
13 H Sweet Potato 4
14 H Walnut 6
15 I Rice 2
DFI <- structure(list(Customer = c("A", "A", "A", "B", "B", "C", "C",
"D", "E", "F", "G", "G", "H", "H", "I"), Product = c("Rice",
"Sweet Potato", "Walnut", "Rice", "Walnut", "Walnut", "Sweet Potato",
"Rice", "Sweet Potato", "Walnut", "Rice", "Sweet Potato", "Sweet Potato",
"Walnut", "Rice"), Revenue = c(10, 2, 4, 3, 2, 3, 4, 3, 4, 7,
2, 3, 4, 6, 2)), .Names = c("Customer", "Product", "Revenue"), row.names = c(NA,
15L), class = "data.frame")
Here's the code to generate all combinations of product Sweet Potato
Rice
and Walnut
:
Combinations<-do.call(c,lapply(seq_along(unique(DFI$Product)),
combn, x = unique(DFI$Product), simplify = FALSE))
[[1]]
[1] "Rice"
[[2]]
[1] "Sweet Potato"
[[3]]
[1] "Walnut"
[[4]]
[1] "Rice" "Sweet Potato"
[[5]]
[1] "Rice" "Walnut"
[[6]]
[1] "Sweet Potato" "Walnut"
[[7]]
[1] "Rice" "Sweet Potato" "Walnut"
Here's my expected output data for frequency of occurrence as per combinations of product types:
Combination Frequency
1 R 2
2 S 1
3 W 1
4 R,S 1
5 S,W 2
6 R,W 1
7 R,S,W 1
DFOUTa <- structure(list(Combination = c("R", "S", "W", "R,S", "S,W", "R,W",
"R,S,W"), Frequency = c(2, 1, 1, 1, 2, 1, 1)), .Names = c("Combination",
"Frequency"), row.names = c(NA, 7L), class = "data.frame")
Here's my expected output data for revenue in the bins (i.e. combinations of product types):
Combination Revenue
1 R 5
2 S 4
3 W 7
4 R,S 5
5 S,W 17
6 R,W 5
7 R,S,W 16
DFOUTb <- structure(list(Combination = c("R", "S", "W", "R,S", "S,W", "R,W",
"R,S,W"), Revenue = c(5, 4, 7, 5, 17, 5, 16)), .Names = c("Combination",
"Revenue"), row.names = c(NA, 7L), class = "data.frame")
I have manually generated above data. I have double-checked to ensure no error.
I am unsure how to generate the two output that I am looking for. I'd sincerely appreciate any help. I'd prefer data.table
based approach because of the size of the data I have in my original dataset.
PS: I have shortened Product Names Rice
, Sweet Potato
and Walnut
to R
, S
, W
respectively in the output file for sake of brevity.