1

I want to calculate certain statistics over a list of dictionaries which looks something like this:

list1 = [{'hello': "world", 'score': 1.2}, {'hello': "world", 'score': 1.5}, {'hello': "world", 'score': 1.02},
         {'hello': "world", 'score': 1.75}]

Specifically, I want to find the min, max, and normalized value of the values associated with score key (meaning I have to update the existing dictionary).

I have implemented it the obvious way which is as follows. However, I was wondering if there is a better way to achieve this?

list1 = [{'hello': "world", 'score': 1.2}, {'hello': "world", 'score': 1.5}, {'hello': "world", 'score': 1.02},
         {'hello': "world", 'score': 1.75}]


def min_value(rank_norm):
    list_values = []
    for x in rank_norm:
        list_values.append(x['score'])
    return min(list_values)


def max_value(rank_norm):
    list_values = []
    for x in rank_norm:
        list_values.append(x['score'])
    return max(list_values)


def normalize_dict(rank_norm, min_val, max_val):
    for x in rank_norm:
        x['score'] = (x['score']-min_val)/(max_val - min_val)
    return rank_norm

min_val_list = min_value(list1)
max_val_list = max_value(list1)

print(min_val_list)
print(max_val_list)

print("Original dict:  ", list1)
print("Normalized dict: ", normalize_dict(list1, min_val_list, max_val_list))

I am using Python 3.

utengr
  • 3,225
  • 3
  • 29
  • 68

6 Answers6

4

You can update your original dictionary like this:

list1 = [{'hello': "world", 'score': 1.2}, {'hello': "world", 'score': 1.5}, {'hello': "world", 'score': 1.02},
     {'hello': "world", 'score': 1.75}]
values = [i["score"] for i in list1]
minimum = min(values)
maximum = max(values)
normalized_dict = [{a:b if a == "hello" else (b-minimum)/float(maximum-minimum) for a, b in i.items()} for i in list1]

Output:

[{'score': 0.24657534246575336, 'hello': 'world'}, {'score': 0.6575342465753424, 'hello': 'world'}, {'score': 0.0, 'hello': 'world'}, {'score': 1.0, 'hello': 'world'}]
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
2

Pure Python

Yes, you can use a generator or list comprehension to obtain the minimum and maximum:

from operator import itemgetter

def min_value(rank_norm):
    return min(map(itemgetter('score'),rank_norm))

def max_value(rank_norm):
    return max(map(itemgetter('score'),rank_norm))

Your code to normalize the dictionary is fine. You can however use list comprehension to construct a new list with dictionaries. If you do not need to update the values, it tends to be safer to construct a new list, since it could be possible that some part of your code can reference to the old list or old dictionaries, and you do not per se want to change these:

def normalize_dict(rank_norm, min_val, max_val):
    delta = max_val-min_val
    return [dict(d,score=(d['score']-min_val)/delta) for d in rank_norm]

Pandas

In case the number of items is huge, you can boost performance by using a pandas dataframe:

import pandas as pd

df = pd.DataFrame(list1)
sc = df['score']
sc_mi = sc.min()
df['score'] = (sc-sc_mi)/(sc.max()-sc_mi)

Then the dataframe is:

>>> df
   hello     score
0  world  0.246575
1  world  0.657534
2  world  0.000000
3  world  1.000000

You can keep processing the dataframe, or if you want a list of dictionaries, you can use:

>>> list(df.T.to_dict().values())
[{'hello': 'world', 'score': 0.24657534246575336}, {'hello': 'world', 'score': 0.6575342465753424}, {'hello': 'world', 'score': 0.0}, {'hello': 'world', 'score': 1.0}]
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • Just a tiny correction: It should be d['score'] in the return statement instead of d[score]. – utengr Oct 12 '17 at 14:42
2

You can combine the min/max computation into one, instead of building the list of scores twice and going over the list multiple times

from operator import itemgetter

min_val, max_val = itemgetter(0, -1)(sorted([x['score'] for x in list1]))
Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
2

Here are more pythonic ways of max and min functions:

def min_value(rank_norm):
    return min([x['score'] for x in rank_norm])

def max_value(rank_norm):
    return max([x['score'] for x in rank_norm])

Not so much faster, but simpler. Also, here is normalize function with a single-line expression, this doesn't look nice, but works:

def normalize_dict(rank_norm, min_val, max_val):
    return [{'hello':x['hello'] , 'score':(x['score']-min_val)/(max_val - min_val)} for x in rank_norm]
Alperen
  • 3,772
  • 3
  • 27
  • 49
2

Pandas

import pandas as pd

your_list = [{'hello': "world", 'score': 1.2}, {'hello': "world", 'score': 1.5}, {'hello': "world", 'score': 1.02},
     {'hello': "world", 'score': 1.75}]

#Reading in to a pandas dataframe
d = pd.DataFrame.from_dict(your_list)

your_list mapped in to a dataframe

print(d)
   hello  score
0  world   1.20
1  world   1.50
2  world   1.02
3  world   1.75

Calulating the stats and updating the score column

d['score'] = (d['score'] - min(d['score']))/(max(d['score'] - min(d['score'])))

How d looks like now,

print(d)
hello     score
0  world  0.246575
1  world  0.657534
2  world  0.000000
3  world  1.000000

Writing the updated dataframe d to a dictionary

updated = pd.DataFrame.to_dict(d, orient = 'records')
print(updated)

[{'score': 0.24657534246575336, 'hello': 'world'}, {'score': 0.6575342465753424, 'hello': 'world'}, {'score': 0.0, 'hello': 'world'}, {'score': 1.0, 'hello': 'world'}]
akilat90
  • 5,436
  • 7
  • 28
  • 42
1

And yet another way to use of operator.itemgetter: sort the list based on the score, extract the min and max score, process..

import operator
a = [{'hello': "world3", 'score': 1.2},  .... ]

score = operator.itemgetter('score')
a.sort(key = score)
minimum = score(a[0])
maximum = score(a[-1])
span = maximum - minimum
for d in a:
    d['score'] = (d['score'] - minimum) / span
wwii
  • 23,232
  • 7
  • 37
  • 77