Group data in R by product number

Question

I have downloaded Amazon data from a website which shows a product number and recommended product that a customer bought after buying a certain product.

For example the data file looks like this:

ProductID   Recommended Product ID
0           1
0           2
0           3
0           4
1           0
1           2

structure(list(ProductID = structure(c(1L, 1L, 1L, 1L, 2L, 2L
), .Label = c("0", "1"), class = "factor"), Recommended_Product_ID = structure(c(1L, 
2L, 3L, 4L, 2L, 3L), .Label = c("1", "2", "3", "4"), class = "factor")), .Names = c("ProductID", 
"Recommended_Product_ID"), row.names = c(NA, -6L), class = "data.frame")

This is an example of the data file. Now we have to use Bipartite package do this and, so I have to skip some elements that are being repeated in the dataset like in the above dataset we have a connection from:

0   1

so, then since we have connection from 0 to 1 then we skip:

1   0

Here is what I have currently:

library(bipartite)
library(igraph)
library(lpbrim)
data <- read.csv("./dataset.txt", header = F, sep = "\t", col.names = c("product1", "recommproduct"))
aggLevel = length(list(data$product1))

In the code I am trying to find out if a someone bought a product with ID 0, then how many other products were bought with that ID. So, in the dataset it shows other product IDs in the recommended product ID list that were bought with the corresponding product ID.

When I print the variable aggLevel, I get 1, instead of getting the count of number of recommended product for that corresponding product ID.

Any help is appreciated.

You have alist with just one member, `list(data$product1)`. Its length is 1. There is no error. — Rui Barradas, Jan 29 '19 at 21:49
I am trying to add the second column of the data into that list, so that I can calculate the number of recommended product for that corresponding product ID. — user2529660, Jan 29 '19 at 22:05

score 0 · Answer 1 · answered Jan 29 '19 at 21:58

If you want to count the recommended products by ProductID, here are 3 base R ways.

xtabs( ~ ProductID, data)
tapply(data$Recommended, data$ProductID, length)
aggregate(Recommended ~ ProductID, data, length)

And one with package dplyr.

library(dplyr)
data %>% group_by(ProductID) %>% summarise(Count = n())

Data.

data <- read.csv(text = "
ProductID   ,Recommended Product ID
0           ,1
0           ,2
0           ,3
0           ,4
1           ,2
1           ,3                   
")
names(data)[2] <- "Recommended"

I tried your solution, but it does not remove connection between 1 and 0, if a connection already exists between 0 and 1. — user2529660, Jan 31 '19 at 19:14

Group data in R by product number

1 Answers1