Relational Database in R

Question

I'm analyzing data from an eCommerce site and everything is stored in a relational format.

I want to calculate the probability that a product is bought by a user (the times a product is ordered divided by the number of orders of the user).

So that the final result is:

User Product Probability

 1  |   2323  |  0.32

userid <-c(1,1,1,1,2,2,2,2)
product<-c(876,324,122,65,44,324,54,23)
probability <- c(0.32,0.10,0.25,0.5,0.7,0.8,0.45,0.05)
exampleresult <- data.frame(userid,product,probability)

Example data:

orderid <- c(100,111,122,134,144,152,164,177,188,199,200,251,222)
userid <- c(1,1,1,2,2,2,2,3,3,4,5,5,6)
orders<-data.frame(orderid,userid)

productid <- c(66,55,44,54,32,23,65,122,656,324,876,342)
productname<-c('soda','corn','apple','milk','juice','water','potato','banana','orange','fish','meat','salami')
products<-data.frame(productid,productname)

orderid <- c(100,100,100,100,100,111,111,111,122,134,134,134,134,144,144,144,144,144,144,152,164,177,188,188,188,188,199,200,251,222)
productid <- c(55,54,324,23,324,54,876,324,122,65,65,44,324,54,23,44,324,23,66,876,65,55,32,122,66,66,44,54,66,65)
ordpro<- data.frame(orderid,productid)

Every time a user buys something an order is created with all the products he or she bought. One user can have multiple orders and each order can have multiple products.

Currently I'm doing this without success. Plus it takes a lot of time considering the amount of users.

x <- numeric(length(unique(orders$userid)))  
y <- list()
for (i in 1:numeric(length(unique(orders$userid)))) {
  y[[i]] <- table(ordpro[ordpro$orderid %in% orders[orders$userid == "orderid"], "productid"])/length(orders,[orders$userid == i,"orderid"])
  x[i] <- length(y[[i]])
}
mydata <- data.frame(x,y)

Have you considered joining the bases or is that not feasible in your case? — RobertMyles, Jun 19 '17 at 19:47
I have not. It would be a huge dataframe but it might be possible. — dank, Jun 19 '17 at 19:48
@italo have a look at dplyr, particularly the new version, which allows you to work with the base while it's not in memory in R. https://github.com/tidyverse/dplyr — RobertMyles, Jun 19 '17 at 19:51
I improved the format and created example data the second time. @BobJansen — dank, Jun 19 '17 at 19:51

score 0 · Accepted Answer · answered Jun 19 '17 at 20:11

0

Check merge function to join your data frames:

http://www.statmethods.net/management/merging.html

How to join (merge) data frames (inner, outer, left, right)?

Avoid using loops

answered Jun 19 '17 at 20:11

napsta32

31
1

Relational Database in R

1 Answers1