0
         M1                    M2               M3
  M1_1 M1_2 M1_diff M2_1 M2_2 M2_diff M3_1 M3_2 M3_diff

A 55.2 60.8   5.6   66.7 69.8   3.1     58.5 60.3 1.8

B 56.8 55.4   1.4   62.8 63.9   1.1     65.7 69.8 4.1

C 52.3 54.3   2.0   53.8 55.9   1.1     56.7 57.9 1.2

I have to find which of the M1,M2,M3 is best for each of A,B,C. the criteria are Mi_1 and Mi_2 shall be highest and Mi_diff shall be lowest(i=1,2,3). Like for id B it may be the second model. I have to select an M for an id. B has lowest diff for M2, so I chose M2 for B, M3 could have been chosen too with its larger accuracy, but diff is big.I cannot come up with any general algorithm to do this. we can put up a cutoff to the diff values and then choose the M's. Like if 1.5 is the lower bound for diff , then M3 is best for id B.

The data is quite big has almost 1000 unique ids and cannot be one manually.I was thinking there may be some easy solution I am not getting. Can anyone please help? I am using R for my computations.

gagolews
  • 12,836
  • 2
  • 50
  • 75
Sayak
  • 183
  • 1
  • 11
  • How is that related to http://stackoverflow.com/questions/24098657/determining-the-best-model-for-forecasting ? – gagolews Jun 07 '14 at 19:24
  • You just need to set a set of rules on how `diff` is related to `1_1` and `1_2` and all the rest should be very straight forward – David Arenburg Jun 07 '14 at 19:29

1 Answers1

1

You just need to come up with some equation that satisfies your criteria. For instance, as you want M1 and M2 to be as high as possible, but their difference to be as low as possible, you may want to maximize:

M1*M2/(M1-M2)

You can add coefficients to this equation to increase the importance of any of the terms.

In R:

# Set RNG seed for reproducibility
set.seed(12345)

# Generate some data
num.rows <- 1000

df <- data.frame(M1_1 = runif(num.rows, 0, 100),
                 M1_2 = runif(num.rows, 0, 100),
                 M2_1 = runif(num.rows, 0, 100),
                 M2_2 = runif(num.rows, 0, 100),
                 M3_1 = runif(num.rows, 0, 100),
                 M3_2 = runif(num.rows, 0, 100))
df$M1_diff <- abs(df$M1_1 - df$M1_2)
df$M2_diff <- abs(df$M2_1 - df$M2_2)
df$M3_diff <- abs(df$M3_1 - df$M3_2)

# We call apply with 1 as the second parameter, 
# meaning the function will be applied to each row
res <- apply(df, 1, function(row)
  { 
  # Our criterium, modify at will 
  M1_prod <- row["M1_1"] * row["M1_2"] / row["M1_diff"]
  M2_prod <- row["M2_1"] * row["M2_2"] / row["M2_diff"]
  M3_prod <- row["M3_1"] * row["M3_2"] / row["M3_diff"]

  # Which is the maximum? Returns 1, 2 or 3
  which.max(c(M1_prod, M2_prod, M3_prod))
  })

And the output

> head(df)
      M1_1       M1_2     M2_1     M2_2     M3_1      M3_2   M1_diff  M2_diff   M3_diff
1 72.09039  7.7756704 95.32788 43.06881 27.16464 18.089266 64.314719 52.25907  9.075377
2 87.57732 84.3713648 62.17875 86.29595 62.93161 18.878981  3.205954 24.11720 44.052625
3 76.09823  0.6813684 53.16722 25.12324 85.90863 72.700354 75.416864 28.04398 13.208273
4 88.61246 35.1184204 89.20926 76.34523 36.97298  3.062528 53.494036 12.86403 33.910451
5 45.64810 68.6061032 19.58807 69.40719 28.21637 58.466682 22.958007 49.81913 30.250311
6 16.63718 25.4086494 88.43795 73.68140 81.37349 75.001685  8.771471 14.75656  6.371807
> head(res)
[1] 2 1 3 2 1 3
nico
  • 50,859
  • 17
  • 87
  • 112