R: Which approach should I use to optimize goals to achieve best results based on historical data?

Question

1- What I have

I have a historical data, with the following datas per store (about 5.000 stores)

(%) Mix of premium product over total product (per month)
Monthly Goals setted for the (%) of this premium product over total product

Ex:

      JUNE        |       JULY
STORE  Mix   Goal | STORE  Mix   Goal   
  A    20%   19%  |   A    20%   21%
  B    18%   18%  |   B    17%   20%
  C    25%   27%  |   C    26%   26%

From each month I will have a growth in the mix from previous month, and a growth in the goal from previous month mix

Ex:

              JUNE -> JULY
STORE  Growth in Mix   Growth in Goal
  A         0%              +1%
  B        -1%              +2%
  C         1%              +1%

2- What I want to do

I want to do the following in the most optmized way in R:

Use historical data to calculate next goals to each store dividing stores in clusters

So R should:

Divide in clusters (ex: 0% - 10%, 10%-20%, etc)
For each cluster calculate the best growth in goal(goal for month - mix previous month) that should provide to the store the best mix result

3- Example of possible solution:

Range of Mix | Growth in goal
   0-10%     |      +3%
   10-18%    |     +2,5%
   18%-27%   |     +1,5%
    ...      |      ...

In this example of solution the next goal (in August) for each store would be:

Company A: 20%(mix july) + 1,5% (suggested goal for range 18-27%) = 21,5%
Company B: 17%(mix july) +2,5% (suggested goal for range 18-27%) = 19,5%
Company C: 27,5%

4 - Reproducible Example

> store <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
> mix_june <- c(0.20, 0.18, 0.25, 0.30, 0.10, 0.25, 0.38, 0.41, 0.28, 0.12)
> goal_june <- c(0.19, 0.18, 0.27, 0.30, 0.11, 0.23, 0.35, 0.43, 0.27, 0.13) 
> mix_july <- c(0.20, 0.17, 0.26, 0.31, 0.10, 0.24, 0.34, 0.42, 0.27, 0.11) 
> goal_july <- c(0.20, 0.20, 0.26, 0.33, 0.09, 0.25, 0.30, 0.44, 0.26, 0.13) 
> mix_august <- c(0.24, 0.15, 0.26, 0.33, 0.10, 0.23, 0.35, 0.41, 0.29, 0.12) 
> goal_august <- c(0.21, 0.22, 0.27, 0.33, 0.11, 0.26, 0.32, 0.40, 0.28, 0.13) 
> base <- data.frame(store, mix_june, goal_june, mix_july, goal_july, mix_august, goal_august)

> base2 <- data.frame(store = base$store, growth_mix_july = base$mix_july- base$mix_june, growth_goal_july = base$goal_july - base$goal_june, growth_mix_august = base$mix_august- base$mix_july, growth_growth_august = base$goal_august - base$mix_july)

Thank you very much in advance

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. The way you have currently shared your data makes it difficult to copy/paste into R. And before you ask for the "most optmized" way, have you tried anything at all? Where exactly are you getting stuck or what have you done that's inefficient? And if you want recommendations for data modeling, you should ask at [stats.se] rather than SO — MrFlick, Nov 29 '18 at 16:22
Thank you for the advice. I edited and added now a reproducible example code — André Segadas Figueiredo, Nov 29 '18 at 18:10

R: Which approach should I use to optimize goals to achieve best results based on historical data?

0 Answers0