Store the values for each key as an array in a dictionary

Question

I would like to normalize all values in the dictionary data and store them again in another dictionary with the same keys and for each key the values should be store in 1D array so I did the following:

>>> data = {1: [0.6065306597126334], 2: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 3: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 4: [0.6065306597126334, 0.6065306597126334]}

>>> norm = {k: [v / sum(vals) for v in vals] for k, vals in data.items()} 

>>> norm
{1: [1], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}

Now suppose the dictionary data contains only a zero value for one of it's keys like the value of the first key 1:

>>> data = {1: [0.0], 2: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 3: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 4: [0.6065306597126334, 0.6065306597126334]}

then normalizing the values of this dictionary will result by [nan] values because of the division by zero

>>> norm = {k: [v / sum(vals) for v in vals] for k, vals in data.items()}

__main__:1: RuntimeWarning: invalid value encountered in double_scalars
>>> norm
{1: [nan], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}

So I inserted an if statement to overcome this issue but I can't store the values for each key as a ID array

the code

>>> norm = {}
>>> for k, vals in data.items():
...     values = []
...     if sum(vals) == 0:
...        values.append(list(vals))
...     else:
...          for v in vals:
...              values.append(list([v/sum(vals)]))
...     norm[k]=values
... 
>>> norm
{1: [[1.0]], 2: [[0.4498162176582741], [0.4498162176582741], [0.10036756468345168]], 3: [[0.4498162176582741], [0.4498162176582741], [0.10036756468345168]], 4: [[0.5], [0.5]]}

I would like to get the norm dictionary as

norm = {1: [1.0], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}

Also, For the dictionary data, while it contains a zero value for one if it's keys, is there a better solution to normalize it because I think that my solution is not efficient!

P.S: I tried at the end of the for loop norm[k]= np.array(values) instead of norm[k]=values but the result was not as required.

Change both your `append` to `extend`. Also, no need to create a `list` from what is being extended, it already is a list — yatu, Mar 21 '19 at 11:08

score 1 · Answer 1 · answered Mar 21 '19 at 11:17

1

append as mentioned above adds an element to a list, and this element can be a list, that's why you currently have a list within a list. Ideally, you should be using extend which concatenates the first list with another list.

answered Mar 21 '19 at 11:17

walugembe peter

58
4

DSC · Accepted Answer · 2019-03-21T12:38:36.023

1

As mentioned in an answer, extend can be used to solve your problem. If you do want to use append, you could take the first element of your lists.

norm = {}
for k, vals in data.items():
    values = []
    if sum(vals) == 0:
        values.append(vals[0])
    else:
        for v in vals:
            values.append([v / sum(vals)][0])
    norm[k] = values

See difference between append vs extend list methods in python for an example of append vs extend

As for the optimization. Completely removing the for loops won't be possible but you can shortify your solution, while still maintaining readability:

norm = {}
for k, vals in data.items():
    if sum(vals) == 0:
        norm[k] = vals
    else:
        norm[k] = [x / sum(vals) for x in vals]

edited Mar 21 '19 at 12:38

answered Mar 21 '19 at 11:26

DSC

1,153
7
21

Thanks for the help. Is there a more efficient way to get this dictionary isntead of all of these loops? – Noah16 Mar 21 '19 at 12:24
@Noah16 I have updated my answer. Please upvote and accept my answer if it solves your problem. – DSC Mar 21 '19 at 12:40

score 0 · Answer 3 · answered Mar 21 '19 at 19:35

Your dict/list comprehension fails when sum(vals) == 0:

>>> data = {1: [0.0], 2: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 3: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 4: [0.6065306597126334, 0.6065306597126334]}
>>> {k: [v / sum(vals) for v in vals] for k, vals in data.items()}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <dictcomp>
  File "<stdin>", line 1, in <listcomp>
ZeroDivisionError: float division by zero

You can introduce a ternary expression to handle the case:

>>> {k: [v / sum(vals) if sum(vals)!=0 else 1.0 for v in vals] for k, vals in data.items()}
{1: [1.0], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}

If you want to avoid to evalaute sum(vals) multiple times:

>>> {k: [v / s if s!=0 else 1.0 for v in vals] for k,vals,s in ((k, vals, sum(vals)) for k, vals in data.items())}
{1: [1.0], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}

((k, vals, sum(vals)) for k, vals in data.items()) is a generator that returns k, vals and sum(vals) for every item.

score 0 · Answer 4 · answered Mar 05 '23 at 21:52

0

This should work as well:

norm = {k: [v / sum(vals) for v in vals] if sum(vals)!=0 else [1] for k, vals in data.items() }

answered Mar 05 '23 at 21:52

obd

1

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 10 '23 at 06:23

Store the values for each key as an array in a dictionary

4 Answers4