0

I have downloaded Amazon data from a website which shows a product number and recommended product that a customer bought after buying a certain product.

For example the data file looks like this:

ProductID   Recommended Product ID
0           1
0           2
0           3
0           4
1           0
1           2

structure(list(ProductID = structure(c(1L, 1L, 1L, 1L, 2L, 2L
), .Label = c("0", "1"), class = "factor"), Recommended_Product_ID = structure(c(1L, 
2L, 3L, 4L, 2L, 3L), .Label = c("1", "2", "3", "4"), class = "factor")), .Names = c("ProductID", 
"Recommended_Product_ID"), row.names = c(NA, -6L), class = "data.frame")

This is an example of the data file. Now we have to use Bipartite package do this and, so I have to skip some elements that are being repeated in the dataset like in the above dataset we have a connection from:

0   1

so, then since we have connection from 0 to 1 then we skip:

1   0

Here is what I have currently:

library(bipartite)
library(igraph)
library(lpbrim)
data <- read.csv("./dataset.txt", header = F, sep = "\t", col.names = c("product1", "recommproduct"))
aggLevel = length(list(data$product1))

In the code I am trying to find out if a someone bought a product with ID 0, then how many other products were bought with that ID. So, in the dataset it shows other product IDs in the recommended product ID list that were bought with the corresponding product ID.

When I print the variable aggLevel, I get 1, instead of getting the count of number of recommended product for that corresponding product ID.

Any help is appreciated.

1 Answers1

0

If you want to count the recommended products by ProductID, here are 3 base R ways.

xtabs( ~ ProductID, data)
tapply(data$Recommended, data$ProductID, length)
aggregate(Recommended ~ ProductID, data, length)

And one with package dplyr.

library(dplyr)
data %>% group_by(ProductID) %>% summarise(Count = n())

Data.

data <- read.csv(text = "
ProductID   ,Recommended Product ID
0           ,1
0           ,2
0           ,3
0           ,4
1           ,2
1           ,3                   
")
names(data)[2] <- "Recommended"
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • I tried your solution, but it does not remove connection between 1 and 0, if a connection already exists between 0 and 1. – user2529660 Jan 31 '19 at 19:14