I am developing a recommendation engine. I think I can’t keep the whole similarity matrix in memory. I calculated similarities of 10,000 items and it is over 40 million float numbers. I stored them in a binary file and it becomes 160 MB.
Wow! The problem is that I could have nearly 200,000 items. Even if I cluster them into several groups and created similarity matrix for each group, then I still have to load them into memory at some point. But it will consume a lot memory.
So, is there anyway to deal with these data?
How should I stored them and load into the memory while ensuring my engine respond reasonably fast to an input?