0

I have a data set containing 3 columns. First column contains Products Name (A through E) and corresponding 2 columns contain nearest 2 neighbors (i.e customers who own Product specified in column A are more likely to buy the next best 2 products (nearest 2 neighbors).

m1 = data.frame(Product=c("A","B","C","D","E"), V1=c("C","A","A","A","D"), 
                V2=c("D","D","B","E","A"))

In the second data set, i have data at user level. First column contains User IDs and corresponding 5 columns contain information whether user own the product or not. 1 - Own it. 0 - Don't own it.

m2 = data.frame(ID = c(1:7), A = rbinom(7,1,1/2), B = rbinom(7,1,1/2), 
                C = rbinom(7,1,1/2), D = rbinom(7,1,1/2), E = rbinom(7,1,1/2))

I want product recommendation at user level. I want m1 data to be merged with m2 based on the user own it or not. The output should look like -

User - 1 A D

Riya
  • 193
  • 1
  • 10

1 Answers1

0

You haven't posted reproducible example and exact expected results, but this seems to do what you want.

set.seed(321)
m1 = data.frame(Product=c("A","B","C","D","E"), V1=c("C","A","A","A","D"), 
                V2=c("D","D","B","E","A"))
m2 = data.frame(ID = c(1:7), A = rbinom(7,1,1/2), B = rbinom(7,1,1/2), 
                C = rbinom(7,1,1/2), D = rbinom(7,1,1/2), E = rbinom(7,1,1/2))

recommended <- apply(m2, 1, function(x) {
  client.recommended <- m1[as.logical(x[-1]),-1]
  top <- names(sort(table(as.vector(t(client.recommended))),
                    decreasing = TRUE)[1:2])
  c(x[1], top)
})

recommended <- as.data.frame(t(recommended), stringsAsFactors = FALSE)
  ID V2 V3
1  1  A  B
2  2  A  D
3  3  A  B
4  4  A  D
5  5  A  D
6  6  A  D
7  7  A  B

What this code does:

  • For every row in m2 data.frame (every client), take that row
  • Take subset of m1 data.frame corresponding to values found in row (if client chosen "A" and "B", take rows "A" and "B" from m1
  • Turn this subset into vector
  • Count occurrences of unique values in vector
  • Sort unique values by count
  • Take first most common unique values
  • Return these values along with client ID
  • Turn everything into proper data.frame for further processing

It seems that you expect to obtain only two products for each client and that is what this code does. For products with the same number of occurrences, apparently one that comes first alphabetically wins. You can get all recommended product by dropping [1:2] part, but then you will need to figure out how to coerce uneven-length vectors into single data.frame.

Community
  • 1
  • 1
Mirek Długosz
  • 4,205
  • 3
  • 24
  • 41