Sort a complex Python dictionary by just one of its values

Question

I am writing a little optimization tool for purchasing stamps at the post office.

In the process I am using a dictionary, which I am sorting according to what I learned in this other "famous" question: Sort a Python dictionary by value

In my case my dictionary is mildly more complex:
- one four-item-tuple to make the key
- and another five-item-tuple to make the data.

The origin of this dictionary is an iteration, where each successful loop is adding one line:

MyDicco[A, B, C, D] = eval, post, number, types, over

This is just a tiny example of a trivial run, trying for 75 cents:
{
(0, 0, 1, 1): (22, 75, 2, 2, 0)
(0, 0, 0, 3): (31, 75, 3, 1, 0)
(0, 0, 2, 0): (2521, 100, 2, 1, 25)
(0, 1, 0, 0): (12511, 200, 1, 1, 125)
(1, 0, 0, 0): (27511, 350, 1, 1, 275)
}

So far I am using this code to sort (is is working):

MyDiccoSorted = sorted(MyDicco.items(), key=operator.itemgetter(1))

I am sorting by my evaluation-score, because the sorting is all about bringing the best solution to the top. The evaluation-score is just one datum out of a five-item-tuple (in the example those are the evaluation-scores: 22, 31, 2521, 12511 and 27511).

As you can see in the example above, it is sorting (as I want it) by the second tuple, index 1. But I had to (grumpily) bring my "evaluation-score" to the front of my second tuple. The code is obviously using the entire second-tuple for the sorting-process, which is heavy and not needed.

Here is my question: How can I please sort more precisely. I do not want to sort by the entire second tuple of my dictionary: I want to target the first item precisely.
And ideally I would like to put this value back to its original position, namely to be the last item in the second tuple - and still sort by it.

I have read-up on and experimented with the syntax of operator.itemgetter() but have not managed to just "grab" the "first item of my second item". https://docs.python.org/3/library/operator.html?highlight=operator.itemgetter#operator.itemgetter

(note: It is permissible to use tuples as keys and values, according to:
https://docs.python.org/3/tutorial/datastructures.html?highlight=dictionary and those are working fine for my project; this question is just about better sorting)

For those who like a little background (you will yell at me that I should use some other method, but I am learning about dictionaries right now (which is one of the purposes of this project)):

This optimization is for developing countries, where often certain values of stamps are not available, or are limited in stock at any given post office. It will later run on Android phones.

We are doing regular mailings (yes, letters). Figuring out the exact postage for each destination with the available values and finding solutions with low stocks of certain values is a not-trivial process, if you consider six different destination-based-postages and hundreds of letters to mail.

There are other modules which help turning the theoretical optimum solution into something that can actually be purchased on any given day, by strategic dialog-guidance...

About my dictionary in this question: I iterate over all reasonable (high enough to make the needed postage and only overpaying up to a fraction of one stamp) combinations of stamp-values.

Then I calculate a "success" value, which is based on the number of stamps needed (priority), the number of types needed (lower priority)(because purchasing different stamps takes extra time at the counter) and a very high penalty for paying-over. So lowest value means highest success.

I collect all reasonable "solutions" in a dictionary where the tuple of needed-stamps serves as the key, and another tuple of some results-data makes up the values. It is mildly over-defined because a human needs to read it at this phase in the project (for debugging).

If you are curious and want to read the example (first line):
The colums are:

number of stamps of 350 cents
number of stamps of 200 cents
number of stamps of 50 cents
number of stamps of 25 cents
evaluation-score
calculated applied postage
total number of stamps applied
total number of stamp-types
over-payment in cents if any

Or in words: (Assuming a postal service is offering existing stamps of 350, 200, 50 and 25 cents), I can apply postage of 75 cents by using 1x 50 cents and 1x 25 cents. This gives me a success-rating of 22 (the best in this list), postage is 75 cents, needing two stamps of two different values and having 0 cents overpayment.

This was my first ever question on stackoverflow and I am impressed and very thankful for the response: Superfast and very, very helpful. I just decided to select the first answer, because it does what I need (I have already tested it with my project). I have mainly selected so fast, that no other users will need to spend time with more answers. I am happy and can move on the the next modules. Thank you all!! — Martin Zaske, Aug 25 '15 at 16:53

score 4 · Answer 1 · answered Aug 25 '15 at 16:35

I find it easier to use lambda expressions than to remember the various operator functions.

Assuming, for the moment, that your eval score is the 3rd item of your value tuple (i.e. (post, number, eval, types, over):

MyDiccoSorted = sorted(MyDicco.items(), key=lamba x:x[1][2])

Alternatively, you can create a named function to do the job:

def myKey(x): return x[1][2]
MyDiccoSorted = sorted(MyDicco.items(), key=myKey)

score 4 · Accepted Answer · answered Aug 25 '15 at 16:35

4

You can just use a double index, something like this should work:

MyDiccoSorted = sorted(MyDicco.items(), key=lambda s: s[1][2])

Just set 2 to whatever the index is of the ID in the tuple.

answered Aug 25 '15 at 16:35

Morgan Thrapp

9,748
3
46
67

Thanks, awesomely fast answer. I see in your answer (and from Rob) the syntax key=lambda s: s[1][4]. Assuming that in pythonian way [4] will give me the fifth item of my values-tuple. What does the first [1] do please? Is it a reference to the second item of my keys-tuple? Or where can I please look-up the syntax for this key=lambda solution? – Martin Zaske Aug 25 '15 at 16:41
1

@MartinZaske Yup. What that does is take in each element of `MyDicco` and assigns it to `s`. Then it gets the first element of `s`, and returns the fourth element of that. – Morgan Thrapp Aug 25 '15 at 16:43
This would therefore not look at the keys at all. You mean the values, when you write "each element", right? That would be a perfect answer... – Martin Zaske Aug 25 '15 at 16:46
1

@MartinZaske Right, it'll just take the second element of the tuple returned by `.items()`, (or the values) and sort by the third element of that. – Morgan Thrapp Aug 25 '15 at 16:50

Anand S Kumar · Answer 3 · 2015-08-25T16:40:54.217

You can use a lambda expression instead of operator.itemgetter() , to get the precise element to sort on. Assuming your eval is the first item in the tuple of values, otherwise use the index of the precise element you want in x[1][0] .Example -

MyDiccoSorted = sorted(MyDicco.items(), key=lambda x: x[1][0])

How this works -

A dict.items() returns something similar to a list of tuples (though not exactly that in Python 3.x) , Example -

>>> d = {1:2,3:4}
>>> d.items()
dict_items([(1, 2), (3, 4)])

Now, in sorted() function, the key argument accepts a function object (which can be lambda , or operator.itemgetter() which also return a function, or any simple function) , the function that you pass to key should accept one argument, which would be the element of the list being sorted.

Then that key function is called with each element, and you are expected to return the correct value to sort the list on. An example to help you understand this -

>>> def foo(x):
...     print('x =',x)
...     return x[1]
...
>>> sorted(d.items(),key=foo)
x = (1, 2)
x = (3, 4)
[(1, 2), (3, 4)]

score 1 · Answer 4 · answered Aug 25 '15 at 16:35

1

does this do what you need?

sorted(MyDicco.items(), key=lambda x: x[1][0])

answered Aug 25 '15 at 16:35

scytale

12,346
3
32
46

score 1 · Answer 5 · answered Aug 25 '15 at 16:35

1

index_of_evaluation_score = 0
MyDiccoSorted = sorted(MyDicco.items(), key=lambda key_value: key_value[1][index_of_evaluation_score])

answered Aug 25 '15 at 16:35

Tom Dalton

6,122
24
35

score 1 · Answer 6 · answered Aug 25 '15 at 16:47

Placing your evaluation score back at the end where you wanted it, you can use the following:

MyDicco = {
    (0, 0, 1, 1): (75, 2, 2, 0, 22),
    (0, 0, 0, 3): (75, 3, 1, 0, 31),
    (0, 0, 2, 0): (100, 2, 1, 25, 2521),
    (0, 1, 0, 0): (200, 1, 1, 125, 12511),
    (1, 0, 0, 0): (350, 1, 1, 275, 27511)} 

MyDiccoSorted = sorted(MyDicco.items(), key=lambda x: x[1][4])

print MyDiccoSorted

Giving:

[((0, 0, 1, 1), (75, 2, 2, 0, 22)), ((0, 0, 0, 3), (75, 3, 1, 0, 31)), ((0, 0, 2, 0), (100, 2, 1, 25, 2521)), ((0, 1, 0, 0), (200, 1, 1, 125, 12511)), ((1, 0, 0, 0), (350, 1, 1, 275, 27511))]

Nic · Answer 7 · 2018-10-04T00:04:26.963

I think one of the things you might be looking for is a stable sort.

Sorting functions in Python are generally "stable" sorts. For example, if you sort:

by its first column, you'll get:

The order of rows sharing the same value in column 1 does not change. 1 4 6 is sorted before 1 2 3 because that was the original order of these rows before the column 1 sort. Sorting has been 'stable' since version 2.2 of Python. More details here.

On another note I'm interested in how much you had to explain your code. That is a sign that the code would benefit from refactoring to make its purpose clearer.

Named tuples could be used to remove the hard-to-read tuple indices you see in many answer here, e.g. key=lambda x: x[1][0]-- what does that actually mean? What is it doing?

Here's a version using named tuples that helps readers (most importantly, you!) understand what your code is trying to do. Note how the lambda now explains itself much better.

from collections import namedtuple

StampMix = namedtuple('StampMix', ['c350', 'c200', 'c50', 'c25'])
Stats = namedtuple('Stats', ['score', 'postage', 'stamps', 'types', 'overpayment'])

data = {
    (0, 0, 1, 1): (22, 75, 2, 2, 0),
    (0, 0, 0, 3): (31, 75, 3, 1, 0),
    (0, 0, 2, 0): (2521, 100, 2, 1, 25),
    (0, 1, 0, 0): (12511, 200, 1, 1, 125),
    (1, 0, 0, 0): (27511, 350, 1, 1, 275)
}

candidates = {}
for stampmix, stats in data.items():
    candidates[StampMix(*stampmix)] = Stats(*stats)

print(sorted(candidates.items(), key=lambda candidate: candidate[1].score))

You can see the benefits of this approach in the output:

>>> python namedtuple.py
(prettied-up output follows...)
[
    (StampMix(c350=0, c200=0, c50=1, c25=1), Stats(score=22, postage=75, stamps=2, types=2, overpayment=0)),
    (StampMix(c350=0, c200=0, c50=0, c25=3), Stats(score=31, postage=75, stamps=3, types=1, overpayment=0)),
    (StampMix(c350=0, c200=0, c50=2, c25=0), Stats(score=2521, postage=100, stamps=2, types=1, overpayment=25)),
    (StampMix(c350=0, c200=1, c50=0, c25=0), Stats(score=12511, postage=200, stamps=1, types=1, overpayment=125)),
    (StampMix(c350=1, c200=0, c50=0, c25=0), Stats(score=27511, postage=350, stamps=1, types=1, overpayment=275))
]

and it will help with your algorithms too. For example:

def score(stats):
    return stats.postage * stats.stamps * stats.types + 1000 * stats.overpayment

Very nice and helpful answer. Thank you. This project is now history for me. But I +1 your answer because it explains this aspect of sorting and of naming very well. And I might come back here later for other projects. — Martin Zaske, Oct 05 '18 at 10:23

Sort a complex Python dictionary by just one of its values

7 Answers7

Linked