1

Let's assume that I am the owner of a burger shop. I log every time that a costumer buys something from my shop, so I have the registries of all burgers and milk-shakes sold on the previous month. For me, It is easier and cheaper to make 20 milk-shakes at once than making 1 at time. So here is my goal:

  • I want to know how often does a client purchase milk-shake after buying a burger, in order to estimate how many milk-shakes I should prepare in the next hour based on how many burgers I have sold.

What I am planning to do is going from row to row in my burgers dataset and checking if the client that bought that burger bought a milk-shake in 1 hour or less. But this would be O(M^B), because for each burger sold I would have to go through all the row at my milk-shake dataset. What could be a more efficient way to do it?

This is just the first approach to check if there is any correlation between those two products. A more complex model would be the next step.

(M is the number of rows of my milk-shake table, and B of my burger's table)

Gabriel Bessa
  • 468
  • 1
  • 4
  • 8
  • 2
    Interesting problem. It would be better if you could provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to let others help better. – Mankind_008 Jul 05 '18 at 18:02
  • Sounds like a time series forecasting problem that requires something (a model) more sophisticated than just checking "if the client that bought that burger bought a milk-shake in 1 hour or less". Maybe better to ask in [Cross Validated](https://stats.stackexchange.com/). – acylam Jul 05 '18 at 18:03
  • @useR First I want to check if the correlation exists. If so, I would build a better model. – Gabriel Bessa Jul 05 '18 at 18:11
  • @Mankind_008 I will build this example and post it, give me a second. – Gabriel Bessa Jul 05 '18 at 18:11
  • To answer to your question about finding a bond between burgers and milkshakes, have you seen the [apriori algorithm](https://en.wikipedia.org/wiki/Apriori_algorithm), to see if the frequencies of burger and milkshake is worth? Also, cspade https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Sequence_Mining/SPADE is nice. They are not timeseries models. – s__ Jul 09 '18 at 09:26

0 Answers0