I am trying to create a market basket matrix from data that looks like the following:
input <- matrix( c(1000001,1000001,1000001,1000001,1000001,1000001,1000002,1000002,1000002,1000003,1000003,1000003,100001,100002,100003,100004,100005,100006,100002,100003,100007,100002,100003,100008), ncol=2)
This represents the folowing data:
colnames(input) <- c( "Customer" , "Product" )
From this a matrix is created which has a customer as a row and all the products as columns. This can be achieved by first creating this matrix with zero's:
input <- as.data.frame(input)
m <- matrix(0, length(unique(input$Customer)), length(unique(input$Product)))
rownames(m) <- unique(input$Customer)
colnames(m) <- unique(input$Product)
This is all fast enough (have data of 750 000+ rows, creating a 15000 by 1500 matrix), but now I want to fill the matrix where appropriate:
for( i in 1:nrow(input) ) {
m[ as.character(input[i,1]),as.character(input[i,2])] <- 1
}
I think there has to be a more efficient way to do this, as I learned from stackoverflow that for loops can often be avoided. So the question is, is there a faster way?
And i need the data in a matrix because i would like to use packages like caret. And after that i will be probably running into the same problem as here R memory management advice (caret, model matrices, data frames), but that's a concern for later.