2

I have a vector of positive numerical values. I need to normalize it so that the sum of the values is 1 (e.g. a probability). This is simple just use x_i/sum(x) as weights. but here is the caveat: I need that no weight will be less than some minimum cutoff, and not weight is greater than some maximum cutoff. Now, this means two things: First, it means that there are cases with no solutions (e.g. 3 objects can't be weights if the maximum cutoff is 0.2). Second, it means that the "relativity" of weights is broken. That is, in standard normalizations (where w_i is the weight given to x_i for all i), w_i/w_j=x_i/x_j for all i,j. with cutoffs, this can't be done. more Formally I'd like to find a function w=rwc(x,min,max) where x is a vector, that returns a vector of the same length that has the following properties:

1) sum(w)=1

2) min <= w_i <= max for all i

3) if x_i <= x_j then w_i<=w_j for all i,j

4) if w_i and w_j are both different from the cutoffs (min and max) then they keep relativity: that is, if min < w_i < max and min < w_j < max then w_i/w_j=x_i/x_j

if there is no solution NULL should be returned.

so my question is:

a) how do you suggest doing that (in R or any other language)?

b) given x, can there be more than one solution (that is, at least two different vectors, w and v, each conforming to the formal requirements above)

This is not strictly an R problem but I encountered it within an project i'm doing in R, so I post it as R. any suggestion for better classification are welcomed.

update

Following the discussion below and after more thoughts it seems necessary to add a fifth requirement to the the 4 above: 5) of all the possible assignments of weights satisfying 1-4, W is the one with minimum number of extreme weights (min or max).

here is a my r code that (hopefully) does that:

#
mwc = function(x,mxw=1,mnw=0) {
cake = 1
agnts = 1:length(x)
result = numeric(length(x))
while(cake>0 & length(agnts)>0) {
    tmp = cake*x[agnts]/sum(x[agnts])
    low = which(tmp<mnw)
    high = which(tmp>mxw)
    if(length(low)+length(high)==0 ) {
        result[agnts] = tmp[agnts]
        break;
    }
    if (length(low)>0) {
        result[agnts[low]] = mnw
    }
    if (length(high)>0) {
        result[agnts[high]] = mxw
    }
    cake = 1-sum(result)
    agnts=agnts[-c(low,high)]
}
if (cake<0) return(NULL) #no solution
if (abs(sum(result)-1)>1e-17) return(NULL)
return(result)
}   
# the end
amit
  • 3,332
  • 6
  • 24
  • 32
  • Your code is too greedy. A trivial counterexample: min is .05 and max is 0.8; v is [1,1,1000000]. If I understand your code correctly, it will fail, because the first step will assign min/max to all elements in w, leaving a remainder in `cake`. However, [.1, .1, .8] is a valid solution. There are other cases where the algorithm will produce a non-minimal solution, according to your criterion 5. – rici Feb 21 '13 at 22:13
  • ok. true. any suggestions? – amit Feb 22 '13 at 20:24

3 Answers3

1

a)

I suggest a brute-force iterative algorithm.

  1. Let x' = x
  2. Calculate sum(x')
  3. Calculate cut-off limits min_x, max_x
  4. Calculate x' from x, adjusting all values outside the range [min_x, max_x]
  5. Repeat 2-4 until x' is stable
  6. Calculate w

In most cases the number of iterations should be few.

b) If there is a min or a max (but not both) the solution vector is unique.

If there is both a min and a max I'm not sure. It feels like it should be unique, but I cannot find an easy proof for it.

Klas Lindbäck
  • 33,105
  • 5
  • 57
  • 82
  • You are right. Initially I was considering only a max. I have edited my answer both for a) and b). – Klas Lindbäck Feb 21 '13 at 14:21
  • ok. thought about it a bit more. it is certainly not unique. consider 10 numbers that are very close to each other, and min=0.05 and max=0.15. then the obvious weighting would be to give all numbers weights very near 0.1 based on their actual values. alternatively you could take the smallest and highest number and set their weights at 0.05 and 0.15 respectively, then set the weights of the 8 other numbers very close to 0.1, and it will still satisfy the conditions. so the solution is not unique. I guess that I can add requirement that there should be minimal use of the cutoff values. – amit Feb 21 '13 at 15:03
  • I thought that the cutoffs were only to be used if the weight would otherwise fall outside the min-max range, but I see now, on a second reading, that there was no such rule. With the current rules, all values could be cutoffs (except maybe the last, that need to be whatever necessary to make the sum 1). – Klas Lindbäck Feb 21 '13 at 16:45
  • ok. after giving some more thought, I will update the original post with the additional requirement, and some R code that does that. comments are still welcome, hoping that there is something more elegant than my solution – amit Feb 21 '13 at 20:30
0

Do you mean something like this? The example is in Haskell, "[ ]" means an empty list.

weights :: [Double] -> Double -> Double -> [Double]
weights my_vector min max = 
  let s = sum my_vector
      try = map (/s) my_vector
  in if all (>=min) try && all (<=max) try
        then try
        else []

OUTPUT:
*Main> weights [1,2,3,4] 0 2
[0.1,0.2,0.3,0.4]
*Main> weights [1,2,3,4] 1 2
[]

UPDATE:
Here's a rough direction (Haskell again), based on this:

import Data.List
import Data.Ord

normalize :: Double -> Double -> Double -> Double -> Double
normalize targetMin targetMax rawMax val =
  let maxReduce = 1 - targetMax/rawMax
      factor = maxReduce * (abs val) / rawMax
  in max ((1 - factor) * val) targetMin

weights :: [Double] -> Double -> Double -> [Double]
weights myVector targetMin targetMax = 
  let try = map (/(sum myVector)) myVector
  in if all (>=targetMin) try && all (<=targetMax) try
        then try
        else weights normalized targetMin targetMax
    where normalized = 
            let targetMax' = (maximum myVector * targetMin / minimum myVector)
            in map (\x -> normalize targetMin targetMax' (maximum myVector) x) myVector

OUTPUT:
*Main> weights [4,4,4,1000] 0.1 0.7
[0.10782286784365082,0.10782286784365082,0.10782286784365082,0.6765313964690475]
*Main> weights [1,1,1000000] 0.05 0.8
[0.12043818322274577,0.12043818322274577,0.7591236335545084]

Community
  • 1
  • 1
גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
  • no. that would be too easy. although your example is correct, the code certainly does not try to find any adjustments that will result in a non-nullsolution – amit Feb 21 '13 at 13:28
  • @amit ...I am confused. Can you give a simple example where x's are adjusted so they satisfy your conditions but do not satisfy w_i/w_j=x_i/x_j for all i,j ? – גלעד ברקן Feb 21 '13 at 14:23
  • for example x = [4,4,4,1000] and the max cutoff is 0.7 and min cutoff is 0, then the weights [0.1,0.1,0.1,0.7] satisfy all conditions, but of course the real weight of the 4th number should have been much higher, so its relations with other weights is broken. – amit Feb 21 '13 at 14:59
  • @groovy: that doesn't satisfy amit's condition 4, which is that the ratio of the (not cutoff) weights is equal to the ratio of the corresponding original values; .61/.13 != 1000/4 – rici Feb 22 '13 at 03:59
  • @rici yes this algorithm was not programmed to do that, only to get a reasonable applied normalization. I actually missed that requirement when working on it. – גלעד ברקן Feb 22 '13 at 04:16
  • @rici please see my second answer. – גלעד ברקן Feb 22 '13 at 06:15
0

This is my second answer which now, I hope, addresses requirement 4) as well. It seems to me that if requirement 4) is to apply then we must divide all elements that are not assigned as cutoffs by:

    denominator = sum non_cutoff_elements / (1 - sum cutoff_elements)

Where 'cutoff_elements' are expressed as their cutoff values. This recursive code tries, I hope, to exhaust the combinations of cutoff assignments. The code seems to solve both amit and rici's examples in their comments. Haskell again:

import Data.List
import Data.Ord

weights :: [Double] -> Double -> Double -> [[Double]]
weights myVector tMin tMax = 
  weights' 0
    where 
      weights' count
        | count == length myVector + 1 = []
        | otherwise =
            let new_list = take count myVector 
                           ++ replicate (length myVector - count) tMax
            in fromLeft new_list 0 ++ weights' (count+1)
                where 
                  fromLeft list' count' = 
                    let non_cutoffs = filter (\x -> x/=tMin && x/=tMax) list'
                        real_count = length list' - length non_cutoffs
                        cutoffs_l = filter (\x -> x==tMin) list'
                        cutoffs_r = filter (\x -> x==tMax) list'
                        denom = sum non_cutoffs / (1 - (sum $ cutoffs_l ++ cutoffs_r))
                        mapped = cutoffs_l ++ (map (/denom) non_cutoffs) ++ cutoffs_r
                        next_list = let left = tMin : cutoffs_l
                                        right = drop 1 cutoffs_r
                                    in left ++ non_cutoffs ++ right
                    in if count' == real_count
                          then []
                          else if sum cutoffs_r > 1 || sum cutoffs_l > 1 
                                  || sum cutoffs_r + sum cutoffs_l > 1
                                  then fromLeft next_list (count'+1)
                          else if sum mapped == 1 && all (>=tMin) mapped && all (<=tMax) mapped
                                  then mapped : fromLeft list' (count'+1)
                                  else fromLeft next_list (count'+1)

OUTPUT:
*Main> weights [4,4,4,1000] 0.1 0.7
[[0.1,0.1,0.1,0.7],[0.1,0.1,0.10000000000000009,0.7],[0.1,0.10000000000000003,0.10000000000000003,0.7],[0.10000000000000002,0.10000000000000002,0.10000000000000002,0.7]]
Rounded to 14 decimal places: [[0.1,0.1,0.1,0.7],[0.1,0.1,0.1,0.7],[0.1,0.1,0.1,0.7],[0.1,0.1,0.1,0.7]]

*Main> weights [1,1,1000000] 0.05 0.8
[[5.0e-2,0.1499999999999999,0.8],[9.999999999999998e-2,9.999999999999998e-2,0.8]]
Rounded to 14 decimal places: [[0.05,0.15,0.8],[0.1,0.1,0.8]]

גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
  • wow...I admit I couldn't follow the code to the end, so I don't completely understand what you did. might be because Haskell is not my forte. – amit Feb 22 '13 at 21:24
  • @amit ...if you understand my premise at the top (i.e., for each case of assigning/converting value/s to cutoffs there is only one denominator to apply to the remaining values that could satisfy your conditions) than all I have done is try all combinations of cutoff assignments, e.g., [CU VAL VAL VAl], [CU VAL VAL CU], [CU, CU, VAL, CU] etc., then apply the denominator from the formula at the top and see if the sum equals 1 and the values are within range. Personally, I kind of like the smoother normalization approximation which didn't follow rule 4 (in my other answer...) Anyway, thx was fun – גלעד ברקן Feb 22 '13 at 22:50