2

I have two sorted dictionaries, i.e. they are now represented as lists. I would like to retrieve the ranking position of each element in each of the lists and store it in a variable so that ultimately I can compute the weighted average of the ranking score of each elements in both lists. Here is an example.

dict1 = {'class1': 15.17, 'class2': 15.95, 'class3': 15.95}

sorted_dict1 = [('class1', 15.17), ('class2', 15.95), ('class3', 15.95)]

sorted_dict2 = [('class2', 9.10), ('class3', 9.22), ('class1', 10.60)]

So far I can retrieve the ranking position of each element in the list and print the ranking but when I try to compute the weighted average of the ranking score i.e. [(w1*a + w2*b)/(w1+w2)], where "a" is the ranking position in sorted_dict1 and "b" is the ranking position in sorted_dict2, the numbers that I get are not the correct weighted average numbers.

Attempted various things, here is one:

for idx, val in list(enumerate(sorted_dict1, 1)):
    for idx1, val1 in list(enumerate(sorted_dict2, 1)):
         position_dict1 = idx
         position_dict2 = idx1
    weighted_average = float((0.50*position_dict1 + 0.25*position_dict2))/0.75     
    print weighted_average

I also didn't consider what should happen if two classes rank the same in a list. I would be grateful to get any hints/help on that too.

I thought that I might need to create a function to solve this, but I didn't go far with that either.

Any help as well as accompanied comments for explaining the code would be great.

So I would like to calculate the weighted average of the ranking position of the elements in the lists. e.g. the weighted average for :

class1: weighted_average = ((0.50 * 1) + (0.25 * 3))/0.75 = 1.5

class2: then the weighted_average = ((0.50 *2)+(0.25*1))/0.75 = 1.6666..7

Thank you!

HR123r
  • 181
  • 2
  • 10
  • Is the dict "sorted" by value or by key? – pzp Apr 02 '15 at 13:01
  • Hi the dictionaries are sorted by value, in ascending order. I apologise for the typo before. – HR123r Apr 02 '15 at 13:08
  • Is ```a``` and ```b``` the position of the elements in their respective sorted lists or overall? I see in your code that they had values of 0.5 and 0.25, which cannot be indices. – pzp Apr 02 '15 at 13:15
  • Just let me clarify something: You **don't** want to calculate the weighted average of the values for each class, eg `'class1': (0.5*15.17 + 0.25*9.10) / 0.75`. Is that correct? – PM 2Ring Apr 02 '15 at 13:20
  • Hi PM 2Ring, I added a few lines at the end of my question to be able to help you more. But no, you are right I don't want the weighted average of the values of each class. Please refer to end part of my question. Thank you. – HR123r Apr 02 '15 at 13:28
  • You do need to "consider what should happen if two classes rank the same in a list". I suggest that in `sorted_dict1` `class3` and `class2` should both get the same ranking - it doesn't seem fair that one should get a higher ranking than the other purely due to the order determined by the sorting algorithm. But should they both get a ranking of 2, or of 2.5... ? – PM 2Ring Apr 02 '15 at 13:44
  • Hi pzp1997, the "a" and "b" are indeed the position of the elements in sorted_dict1 and sorted_dict2 respectively. The 0.5 and 0.25 are the weight I want to place for that position in calculating the weighted average. Thank you. – HR123r Apr 02 '15 at 13:49

1 Answers1

1

I've taken the easy route and given classes of equal score the next integer rank, so class3 and class2 both have rank 2 in sorted_dict1

#!/usr/bin/env python

#Get the ranks for a list of (class, score) tuples sorted by score
#and return them in a dict
def get_ranks(sd):
    #The first class in the list has rank 1
    k, val = sd[0]
    r = 1
    rank = {k: r}

    for k, v in sd[1:]:
        #Only update the rank number if this value is 
        #greater than the previous
        if v > val:
            val = v
            r += 1
        rank[k] = r
    return rank

def weighted_mean(a, b):
    return (0.50*a + 0.25*b) / 0.75

sorted_dict1 = [('class1', 15.17), ('class2', 15.95), ('class3', 15.95)]
sorted_dict2 = [('class2', 9.10), ('class3', 9.22), ('class1', 10.60)]

print sorted_dict1
print sorted_dict2

ranks1 = get_ranks(sorted_dict1)
ranks2 = get_ranks(sorted_dict2)

print ranks1
print ranks2

keys = sorted(k for k,v in sorted_dict1)

print [(k, weighted_mean(ranks1[k], ranks2[k])) for k in keys]

output

[('class1', 15.17), ('class2', 15.949999999999999), ('class3', 15.949999999999999)]
[('class2', 9.0999999999999996), ('class3', 9.2200000000000006), ('class1', 10.6)]
{'class2': 2, 'class3': 2, 'class1': 1}
{'class2': 1, 'class3': 2, 'class1': 3}
[('class1', 1.6666666666666667), ('class2', 1.6666666666666667), ('class3', 2.0)]

In the comments I mentioned that there's a nice way to create a weighted_mean() function with custom weights. Of course, we could just pass the weights as additional arguments to weighted_mean(), but that makes the call to weighted_mean() more cluttered than it needs to be, making the program harder to read.

The trick is to use a function that takes the custom weights as arguments and returns the desired function. Technically, such a function-making function is called a closure.

Here's a short demo of how to do that.

#!/usr/bin/env python

#Create a weighted mean function with weights w1 & w2
def make_weighted_mean(w1, w2):
    wt = float(w1 + w2)
    def wm(a, b):
        return (w1 * a + w2 * b) / wt
    return wm

#Make the weighted mean function
weighted_mean = make_weighted_mean(1, 2)

#Test
print weighted_mean(6, 3)
print weighted_mean(3, 9)

output

4.0
7.0

Here's an updated version of the first program above that handles an arbitrary number of sorted_dict lists. It uses the original get_ranks() function, but it uses a slightly more complex closure than the above example to do the weighted means on a list (or tuple) of data.

#!/usr/bin/env python

''' Weighted means of ranks

    From https://stackoverflow.com/q/29413531/4014959

    Written by PM 2Ring 2015.04.03
'''

from pprint import pprint

#Create a weighted mean function with weights from list/tuple weights
def make_weighted_mean(weights):
    wt = float(sum(weights))
    #A function that calculates the weighted mean of values in seq 
    #weighted by the weights passed to make_weighted_mean()
    def wm(seq):
        return sum(w * v for w, v in zip(weights, seq)) / wt
    return wm


#Get the ranks for a list of (class, score) tuples sorted by score
#and return them in a dict
def get_ranks(sd):
    #The first class in the list has rank 1
    k, val = sd[0]
    r = 1
    rank = {k: r}

    for k, v in sd[1:]:
        #Only update the rank number if this value is 
        #greater than the previous
        if v > val:
            val = v
            r += 1
        rank[k] = r
    return rank


#Make the weighted mean function
weights = [0.50, 0.25]
weighted_mean = make_weighted_mean(weights)

#Some test data
sorted_dicts = [
    [('class1', 15.17), ('class2', 15.95), ('class3', 15.95), ('class4', 16.0)],
    [('class2', 9.10), ('class3', 9.22), ('class1', 10.60), ('class4', 11.0)]
]
print 'Sorted dicts:'
pprint(sorted_dicts, indent=4)

all_ranks = [get_ranks(sd) for sd in sorted_dicts]
print '\nAll ranks:'
pprint(all_ranks, indent=4)

#Get a sorted list of the keys
keys = sorted(k for k,v in sorted_dicts[0])
#print '\nKeys:', keys

means = [(k, weighted_mean([ranks[k] for ranks in all_ranks])) for k in keys]
print '\nWeighted means:'
pprint(means, indent=4)

output

Sorted dicts:
[   [   ('class1', 15.17),
        ('class2', 15.949999999999999),
        ('class3', 15.949999999999999),
        ('class4', 16.0)],
    [   ('class2', 9.0999999999999996),
        ('class3', 9.2200000000000006),
        ('class1', 10.6),
        ('class4', 11.0)]]

All ranks:
[   {   'class1': 1, 'class2': 2, 'class3': 2, 'class4': 3},
    {   'class1': 3, 'class2': 1, 'class3': 2, 'class4': 4}]

Weighted means:
[   ('class1', 1.6666666666666667),
    ('class2', 1.6666666666666667),
    ('class3', 2.0),
    ('class4', 3.3333333333333335)]

And here's an alternate version of get_ranks() that skips rank numbers if two or more classes rank the same in a list

def get_ranks(sd):
    #The first class in the list has rank 1
    k, val = sd[0]
    r = 1
    rank = {k: r}
    #The step size from one rank to the next. Normally 
    #delta is 1, but it's increased if there are ties.
    delta = 1

    for k, v in sd[1:]:
        #Update the rank number if this value is 
        #greater than the previous. 
        if v > val:
            val = v
            r += delta
            delta = 1
        #Otherwise, update delta
        else:
            delta += 1
        rank[k] = r
    return rank

Here's the output of the program using that alternate version of get_ranks():

Sorted dicts:
[   [   ('class1', 15.17),
        ('class2', 15.949999999999999),
        ('class3', 15.949999999999999),
        ('class4', 16.0)],
    [   ('class2', 9.0999999999999996),
        ('class3', 9.2200000000000006),
        ('class1', 10.6),
        ('class4', 11.0)]]

All ranks:
[   {   'class1': 1, 'class2': 2, 'class3': 2, 'class4': 4},
    {   'class1': 3, 'class2': 1, 'class3': 2, 'class4': 4}]

Weighted means:
[   ('class1', 1.6666666666666667),
    ('class2', 1.6666666666666667),
    ('class3', 2.0),
    ('class4', 4.0)]
Community
  • 1
  • 1
PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • Hi PM 2Ring, thank you for your answer. I am unsure what this section of code def(get_ranks(sd), does, in particular where you only update the rank number if the value is greater than the previous : and for example shouldn't the weighted average of the rank of class one be equal to 1.5 and not 1.6667? So in the sorted_dict1 the class2 and class3 indeed they have equal ranking, that is 2. And thus a class4 would had ranking 3? or 4? Thank you. – HR123r Apr 02 '15 at 14:29
  • @HR123r: `((0.50 * 1) + (0.25 * 3))/0.75 = 1.666666...`. I only update the rank number if the value is greater than the previous so that classes with equal score get the same rank, as I described in my [comment](http://stackoverflow.com/questions/29413531/retrieve-the-ranking-of-elements-in-various-list-to-compute-the-weighted-average/29415131#comment47001515_29413531). Using that scheme a class4 would have the next ranking 3, since both class2 and class3 share 2nd place: it's like they're merged into one class. – PM 2Ring Apr 02 '15 at 14:36
  • Sorry if that's not how you want the classes with equal score handled, but I _did_ make my comments a while before I started coding, and you hadn't replied... but it's not too hard to modify my `get_ranks()` function to give class4 a rank of 4 (in reference to your example above). – PM 2Ring Apr 02 '15 at 14:38
  • Hi thank you very much for your reply and your answer. Yes you are absolutely right. I am trying to multi task and I get my maths wrong. I learned a lot from your answer. Thank you! :) – HR123r Apr 02 '15 at 14:58
  • Hey PM 2Ring, I only just managed to check your answer with my code and it most certainly works. Thank you. Do you mind sharing why you chose to write the line : keys = sorted(k for k, v in sorted_dict1)? I have a vague idea but I wanted to be sure. Also if I wanted to use argparse to define the weight values (i.e. 0.50, 0.25) I shouldn't have an issue right? I mean it's not going to be difficult to do so since they are within a defined function? def weighted_mean(a, b): what do you think please? Thank you!!! – HR123r Apr 03 '15 at 14:59
  • @HR123r: `keys = sorted(k for k, v in sorted_dict1)` is just a simple way to get a sorted list of keys from a dict that works equally well on Python 2 & Python 3. As for being able to use other weights in `weighted_mean(a,b)` there's a really cool way to do that. I'll add some code to my answer to show you how, but it'll have to wait until tomorrow (it's 2AM here). – PM 2Ring Apr 03 '15 at 15:05
  • yeah, and I tried with the argparse to define the "a" and "b" within the def weighted_mean() and it works fine! :) – HR123r Apr 03 '15 at 15:18
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/74434/discussion-between-hr123r-and-pm-2ring). – HR123r Apr 03 '15 at 16:27
  • @PM2Ring is it possible to implement this solution with two similar lists? Lets say the 2 sorted_dicts that are provided, one of the dicts has a unique value, eg 'class5'. This program breaks at `means` if all the values in the 2 sorted_dicts are not the same. – Pysnek313 Jan 12 '21 at 19:35
  • 1
    @Pysnek313 Yes, it's possible. The crash occurs when `ranks[k]` tries to fetch the value of a non-existent key of a rank dict. We can easily fix that by using the `.get` method and supplying an appropriate default rank. Eg, `weighted_mean([ranks.get(k, default_rank) for ranks in all_ranks])`. But I don't know what an appropriate value for `default_rank` would be. ;) – PM 2Ring Jan 12 '21 at 21:12