I am trying to do some market basket analysis using the arules
package, but when I use the summary()
function on an itemMatrix
object to check which are the most frequent items, the numbers do not add up.
If I do:
library(arules)
x <- read.transactions("Supermarket2014-15.csv")
summary(x)
I get:
transactions as itemMatrix in sparse format with
5001 rows (elements/itemsets/transactions) and
997 columns (items) and a density of 0.003557162
most frequent items:
45 28 42 35 22 (Other)
503 462 444 440 413 15474
But if I check with a for
loop, or even in Excel, the count for the product 45 is 513 and not 503. The same for 28, which should be 499, and so on.
The odd thing is if I sum up all the totals (15474+413+440+444+462+503)
I get the correct number for the total of transacted products.
The data has several NA
values and products are factors.
And here is the raw data (Day ranges from 1 to 28, Product ranges from 1 to 50):