Let's assume that I am the owner of a burger shop. I log every time that a costumer buys something from my shop, so I have the registries of all burgers and milk-shakes sold on the previous month. For me, It is easier and cheaper to make 20 milk-shakes at once than making 1 at time. So here is my goal:
- I want to know how often does a client purchase milk-shake after buying a burger, in order to estimate how many milk-shakes I should prepare in the next hour based on how many burgers I have sold.
What I am planning to do is going from row to row in my burgers dataset and checking if the client that bought that burger bought a milk-shake in 1 hour or less. But this would be O(M^B), because for each burger sold I would have to go through all the row at my milk-shake dataset. What could be a more efficient way to do it?
This is just the first approach to check if there is any correlation between those two products. A more complex model would be the next step.
(M is the number of rows of my milk-shake table, and B of my burger's table)