0

I would like to normalize the values of a Python dictionary, while maintaining a lower bound for each value, where the lower bound is given by 1/(10 * n), where n is the number of values in the dictionary.

The values of a dictionary d can be normalized as below.

from __future__ import division
d = {1:0.5, 2: 1.5, 3: 2.0, 4: 0.02, 5: 3.1}
total = sum(d.values())
d = {k: v / total for k, v in d.iteritems()}
print d

The lower bound in this case is 0.02

lb = 1 / (10 * len(d.keys()))
print lb

And some values are less than the lower bound after normalization. The problem is that if I set the values less than the lb to lb, the sum of values becomes greater than one. The following approach does not work

D = dict(zip(d.keys(),[lb if i/total < lb else i/total for i in d.values()]))
print sum(D.values())

We need a way to repeatedly normalize and set lb till all values are more than lb and the sum of values is 1. What would the Python way solve this problem?

user58925
  • 1,537
  • 5
  • 19
  • 28
  • `dict(zip(d.keys(),[lb if i/total < lb else i/total for i in d.values()]))` you can actually refer this answer https://stackoverflow.com/questions/26785354/normalizing-a-list-of-numbers-in-python – Chandu Sep 24 '18 at 08:41
  • let us call the dict created by this method as D, you'll see the sum(D.values()) does not equal. Once you set values below lb to lb, the sum of values is no longer 1, which is the issue. – user58925 Sep 24 '18 at 08:51
  • Unrelated, but Python doesn't guarantee that `d.keys()` and `d.values()` will iterate in the same order. `dict(zip(d.keys(), d.values()))` is not necessarily equal to `d`. – Lie Ryan Sep 24 '18 at 09:04
  • @LieRyan https://stackoverflow.com/a/835111/4799172 – roganjosh Sep 24 '18 at 09:07
  • what about upper bound? – Chandu Sep 24 '18 at 09:12
  • No upper bound; Each value will be less than one because they have to sum to one, but no explicit upper bound. – user58925 Sep 24 '18 at 09:15
  • `x = {1:0.5, 2: 1.5, 3: 2.0, 4: 0.02, 5: 3.1} lb = 1/(10*len(list(x.values()))) def run(lst): d = [lb if i/total < lb else i/total for i in l] if any(i – Chandu Sep 24 '18 at 09:33
  • I am getting sum `1.0171910112359552` – Chandu Sep 24 '18 at 09:33
  • hmm. We need a sum of 1 – user58925 Sep 24 '18 at 09:35
  • Actually we can try recursion, please check the code and try to optimize and i cannot post it as answer unless I am confident abt it. – Chandu Sep 24 '18 at 09:36
  • I check the code in your previous comment, but the sum of values was greater than one, perhaps my question isn't clear enough, sorry about that. – user58925 Sep 24 '18 at 10:05

1 Answers1

0

The code below solves the problem, but I am not sure its the fastest or cleanest method. I'd appreciate any improvement in the speed of the code below.

from __future__ import division

Z = {1: 0.5, 2: 1.5, 3: 2.0, 4: 0.02, 5: 3.1}

total = sum(Z.values())

lb = 1 / (10 * len(Z.keys()))
print lb, "lower bound"

U = []
for k in Z.keys():
    if Z[k] > lb:
        U.append(k)
B = []
for k in Z.keys():
    if Z[k] <= lb:
        U.append(k)

def setBound():
    count = 0
    for k in U:
        w = Z[k]
        if w < lb:
            Z[k] = lb
            U.remove(k)
            B.append(k)
            count += 1
    return count

def normalize():
    bounded_sum = lb * len(B)
    unbounded_sum = 0
    for k in U:
        unbounded_sum += Z[k]
    multiple = (1-bounded_sum) / unbounded_sum
    for k in U:
        Z[k] *= multiple



def adjust():
    normalize()
    count = setBound()
    if count > 0:
        normalize()
        return adjust()
    else:
        return

adjust()
print Z
print sum(Z.values()), "sum of all values"
user58925
  • 1,537
  • 5
  • 19
  • 28