Calculating min and max over a list of dictionaries for normalizing dictionary values

Question

I want to calculate certain statistics over a list of dictionaries which looks something like this:

list1 = [{'hello': "world", 'score': 1.2}, {'hello': "world", 'score': 1.5}, {'hello': "world", 'score': 1.02},
         {'hello': "world", 'score': 1.75}]

Specifically, I want to find the min, max, and normalized value of the values associated with score key (meaning I have to update the existing dictionary).

I have implemented it the obvious way which is as follows. However, I was wondering if there is a better way to achieve this?

list1 = [{'hello': "world", 'score': 1.2}, {'hello': "world", 'score': 1.5}, {'hello': "world", 'score': 1.02},
         {'hello': "world", 'score': 1.75}]


def min_value(rank_norm):
    list_values = []
    for x in rank_norm:
        list_values.append(x['score'])
    return min(list_values)


def max_value(rank_norm):
    list_values = []
    for x in rank_norm:
        list_values.append(x['score'])
    return max(list_values)


def normalize_dict(rank_norm, min_val, max_val):
    for x in rank_norm:
        x['score'] = (x['score']-min_val)/(max_val - min_val)
    return rank_norm

min_val_list = min_value(list1)
max_val_list = max_value(list1)

print(min_val_list)
print(max_val_list)

print("Original dict:  ", list1)
print("Normalized dict: ", normalize_dict(list1, min_val_list, max_val_list))

I am using Python 3.

score 4 · Answer 1 · answered Oct 12 '17 at 14:10

You can update your original dictionary like this:

list1 = [{'hello': "world", 'score': 1.2}, {'hello': "world", 'score': 1.5}, {'hello': "world", 'score': 1.02},
     {'hello': "world", 'score': 1.75}]
values = [i["score"] for i in list1]
minimum = min(values)
maximum = max(values)
normalized_dict = [{a:b if a == "hello" else (b-minimum)/float(maximum-minimum) for a, b in i.items()} for i in list1]

Output:

[{'score': 0.24657534246575336, 'hello': 'world'}, {'score': 0.6575342465753424, 'hello': 'world'}, {'score': 0.0, 'hello': 'world'}, {'score': 1.0, 'hello': 'world'}]

Willem Van Onsem · Accepted Answer · 2017-10-12T14:55:31.843

Pure Python

Yes, you can use a generator or list comprehension to obtain the minimum and maximum:

from operator import itemgetter

def min_value(rank_norm):
    return min(map(itemgetter('score'),rank_norm))

def max_value(rank_norm):
    return max(map(itemgetter('score'),rank_norm))

Your code to normalize the dictionary is fine. You can however use list comprehension to construct a new list with dictionaries. If you do not need to update the values, it tends to be safer to construct a new list, since it could be possible that some part of your code can reference to the old list or old dictionaries, and you do not per se want to change these:

def normalize_dict(rank_norm, min_val, max_val):
    delta = max_val-min_val
    return [dict(d,score=(d['score']-min_val)/delta) for d in rank_norm]

Pandas

In case the number of items is huge, you can boost performance by using a pandas dataframe:

import pandas as pd

df = pd.DataFrame(list1)
sc = df['score']
sc_mi = sc.min()
df['score'] = (sc-sc_mi)/(sc.max()-sc_mi)

Then the dataframe is:

>>> df
   hello     score
0  world  0.246575
1  world  0.657534
2  world  0.000000
3  world  1.000000

You can keep processing the dataframe, or if you want a list of dictionaries, you can use:

>>> list(df.T.to_dict().values())
[{'hello': 'world', 'score': 0.24657534246575336}, {'hello': 'world', 'score': 0.6575342465753424}, {'hello': 'world', 'score': 0.0}, {'hello': 'world', 'score': 1.0}]

Just a tiny correction: It should be d['score'] in the return statement instead of d[score]. — utengr, Oct 12 '17 at 14:42

score 2 · Answer 3 · answered Oct 12 '17 at 14:06

2

You can combine the min/max computation into one, instead of building the list of scores twice and going over the list multiple times

from operator import itemgetter

min_val, max_val = itemgetter(0, -1)(sorted([x['score'] for x in list1]))

answered Oct 12 '17 at 14:06

Moses Koledoye

77,341
8
133
139

The problem is that sorting works in *O(n log n)* wheras calculating min/max is done in *O(n)*. +1 nevertheless for elegance :) – Willem Van Onsem Oct 12 '17 at 14:06
@WillemVanOnsem That's from the complexity angle. Wall clock time for building and sorting one list would be less than that for min/max of two built lists. – Moses Koledoye Oct 12 '17 at 14:08
Besides for small `n`, `nlogn ≈ n` – Moses Koledoye Oct 12 '17 at 14:12

Alperen · Answer 4 · 2017-10-13T06:20:27.563

Here are more pythonic ways of max and min functions:

def min_value(rank_norm):
    return min([x['score'] for x in rank_norm])

def max_value(rank_norm):
    return max([x['score'] for x in rank_norm])

Not so much faster, but simpler. Also, here is normalize function with a single-line expression, this doesn't look nice, but works:

def normalize_dict(rank_norm, min_val, max_val):
    return [{'hello':x['hello'] , 'score':(x['score']-min_val)/(max_val - min_val)} for x in rank_norm]

akilat90 · Answer 5 · 2017-10-12T14:36:42.093

Pandas

import pandas as pd

your_list = [{'hello': "world", 'score': 1.2}, {'hello': "world", 'score': 1.5}, {'hello': "world", 'score': 1.02},
     {'hello': "world", 'score': 1.75}]

#Reading in to a pandas dataframe
d = pd.DataFrame.from_dict(your_list)

your_list mapped in to a dataframe

print(d)
   hello  score
0  world   1.20
1  world   1.50
2  world   1.02
3  world   1.75

Calulating the stats and updating the score column

d['score'] = (d['score'] - min(d['score']))/(max(d['score'] - min(d['score'])))

How d looks like now,

print(d)
hello     score
0  world  0.246575
1  world  0.657534
2  world  0.000000
3  world  1.000000

Writing the updated dataframe d to a dictionary

updated = pd.DataFrame.to_dict(d, orient = 'records')
print(updated)

[{'score': 0.24657534246575336, 'hello': 'world'}, {'score': 0.6575342465753424, 'hello': 'world'}, {'score': 0.0, 'hello': 'world'}, {'score': 1.0, 'hello': 'world'}]

score 1 · Answer 6 · answered Oct 12 '17 at 14:38

And yet another way to use of operator.itemgetter: sort the list based on the score, extract the min and max score, process..

import operator
a = [{'hello': "world3", 'score': 1.2},  .... ]

score = operator.itemgetter('score')
a.sort(key = score)
minimum = score(a[0])
maximum = score(a[-1])
span = maximum - minimum
for d in a:
    d['score'] = (d['score'] - minimum) / span

Calculating min and max over a list of dictionaries for normalizing dictionary values

6 Answers6

Pure Python

Pandas

Pandas

Linked