Sorting data in a dictionary alphabetically in python

Question

I am learning how to use a dictionary to group the name and score together. It prints out from high to low score. I am trying to get it to only print out the highest score per person which I believe is using MAX but I can't do it. Any ideas?

I also need to calculate the average score per student, so if they had 3 scores is that using len?

scores = {}
resultfile = open("results.txt")
for line in resultfile:
    (name, score) = line.split()
    scores[score]=name
resultfile.close()

print("The top scores were:")
for each_score in sorted(scores.keys(), reverse = True):
     print(scores[each_score] + each_score)

*but I can't do it* ... Why? What is the problem that you are facing with the code? — Bhargav Rao, Apr 18 '15 at 10:37
Since this almost certainly about your [GCSE programming problem](http://www.reddit.com/r/Python/comments/2gawvg/gcse_computing_programming_tasks_14_16_year_olds/), please do read [Open letter to students with homework problems](http://meta.programmers.stackexchange.com/q/6166). — Martijn Pieters, Apr 18 '15 at 10:41
possible duplicate of [How can I sort a Python dictionary sort by key?](http://stackoverflow.com/questions/9001509/how-can-i-sort-a-python-dictionary-sort-by-key) — Nir Alfasi, Apr 18 '15 at 10:45

Martijn Pieters · Answer 1 · 2015-04-18T10:43:54.570

You need to ensure you are working with the correct data types here; to sort scores (integers) you want to convert from str to int; strings are sorted lexicographically (first characters first, then second, etc, just like alphabetising), integers are compared numerically. So the string '10' sorts before '9', but the integer 10 sorts after 9.

You also need to store your scores in lists for each name, not just store the last name and score:

scores = {}
resultfile = open("results.txt")
for line in resultfile:
    name, score = line.split()
    score = int(score)
    scores.setdefault(name, []).append(score)
resultfile.close()

Now you have a mapping from name -> [score1, score2, score3, ...].

You need to write a sorting key here; one that returns the maximum score for a given key in the dictionary:

sorted(scores, key=lambda key: max(scores[key]), reverse=True)

The key argument of the sorted() function must be a function, and it is given each element that is being sorted in turn, and should return the value by which to sort. If scores[key] is all scores for that given user, then max(scores[key]) will be their highest score.

If your scores were already sorted from highest score to lowest, then you don't need a key function as sequences are compared lexicographically.

Next, if you need to display the highest score, then max() is all you need:

sorted_by_highest_score = sorted(scores, key=lambda key: max(scores[key]), reverse=True)
for name in sorted_by_highest_score:
    highest_score = max(scores[name])
    print(name, highest_score)

To calculate the average, all you need to do is take the sum of the scores divided by the number of scores, so using len():

average = sum(scores[name]) / len(scores[name])

@PadraicCunningham: ah, yes, they are reading from a file; updating. — Martijn Pieters, Apr 18 '15 at 10:44
@PadraicCunningham: and swap the key and value, and collect the scores in lists. — Martijn Pieters, Apr 18 '15 at 10:48

unutbu · Answer 2 · 2015-04-18T21:54:40.763

0

Your main problem is that you've mapped scores to names:

scores[score]=name

What if two people have the same score? score[score] = name would overwrite (lose) one of the names since a dict can only may one key (e.g. score) to one value (e.g. name). Therefore, instead, you need to map names to a list of scores:

scores.setdefault(name, []).append(name)

The setdefault method returns scores[name] if name is in scores, and returns a new empty list [], assigned to scores[name], otherwise.

Sorting:

With scores being a dict mapping names to scores, sorting the names alphebatically is easy: you could use sorted(scores).

To sort scores by maximum score from highest to lowest, you could use

sorted(scores, key=lambda name: max(scores[name]), reverse=True)

See HOWTO Sort for an excellent tutorial on sorting, including the use of the key parameter.

Keeping the last three values:

To store just the last three values for each name, you could use a collections.deque, which is a list-like container which can have a maximum length. As items are appended to the deque, older items are dropped if the maximum length has been reached.

For example, here is a deque with maximum length 3:

In [100]: d = collections.deque(maxlen=3)

We can insert three values:

In [101]: d.extend([1,2,3])

In [102]: d
Out[102]: deque([1, 2, 3], maxlen=3)

But when we insert a fourth value, only the last three are kept:

In [103]: d.append(4)

In [104]: d
Out[104]: deque([2, 3, 4], maxlen=3)

Thus, to sort the names according to the maximum of the last 3 scores per person, you could use:

import collections

scores = {}
with open("results.txt") as resultfile:
    for line in resultfile:
        name, score = line.split()
        scores.setdefault(name, collections.deque(maxlen=3)).append(float(score))

print("The top and average scores were:")
for name in sorted(scores, key=lambda name: max(scores[name]), reverse=True):
    ave = sum(scores[name])/len(scores[name])
    m = max(scores[name])
    print('{name}: {m} {a}'.format(name=name, m=m, a=ave))

An alternative to avoid the double computation:

One weakness of the above code is that the quantity max(scores[name]) is computed twice: once in the call to sorted, and once inside the for-loop.

One way to avoid this double computation is to precompute the values once and store the results in a list, data:

data = []
for name, vals in scores.items():
    m = max(vals)
    ave = sum(vals)/len(vals)
    data.append((ave, name, m))

data is now a list of tuples. Each tuple has the form (ave, name, m). Sorting a list of tuples is done lexicographically. The tuples are sorted according to the first element, with the second element used to break ties, and then the third to break any remaining ties, and so on.

So

for ave, name, m in sorted(data, reverse=True):
    print('{name}: {m} {a}'.format(name=name, m=m, a=ave))

would iterate over the tuples in data, from highest average to lowest average, and the averages are only computed once. The disadvantage of doing it this way is that it requires more memory (to store data). So the two options shown above each has a pro and con. The first method requires less memory, the second requires less computation.

edited Apr 18 '15 at 21:54

answered Apr 18 '15 at 10:46

unutbu

842,883
184
1,785
1,677

Thank you. If the system only needs to store the last 3 scores, what is the easiest way to overwrite the older ones? – Canadian1010101 Apr 18 '15 at 16:30
You could use a `collections.deque` with a maximum length of 3. I've edited the post above to show what I mean. – unutbu Apr 18 '15 at 16:50
Ok, will look into collections. Thanks. What is the best way to sort highest to lowest by score. At present, it sorts alphabetically? Do I just change where I sort or swap around the score and name? – Canadian1010101 Apr 18 '15 at 17:47
`sorted(scores)` sorts the names (or, more generally, the keys of the `scores` dict) alphabetically. To sort the names by the maximum score, you would need to use the `key` parameter: `sorted(scores, key=lambda name: max(scores[name]), reverse=True)`. I've added a bit more about this above. – unutbu Apr 18 '15 at 17:54
Note that you must not swap the score and the name -- the name *must* be the key and the scores *must* be values. If you were to swap these roles, then you would run into the collision problem -- two people with the same score could not be represented by the data structure because one score could only be mapped to one name, not two. Names may uniquely map to scores, but scores can not uniquely map to names. – unutbu Apr 18 '15 at 17:58
and where does it go? below for name in sorted(scores) ? – Canadian1010101 Apr 18 '15 at 18:24
`sorted(scores)` can be replaced by `sorted(scores, key=...)`. Try it in an interactive session to get a better idea of what these expressions are returning. – unutbu Apr 18 '15 at 20:00
Ok, what have I done wrong here. I've followed the logic I believe, but won't print out?............................................................ import collections scores = {} with open("results.txt") as resultfile: for line in resultfile: name, score = line.split() scores.setdefault(name, collections.deque(maxlen=3)).append(float(score)) print("In alphabetical order by student, the scores are:") for name in sorted(scores): sorted(scores, key=lambda name: max(scores[name]), reverse=True) print(sorted) – Canadian1010101 Apr 18 '15 at 20:30
It's even simpler than what you're trying. You literally just substitute `sorted(scores, key=lambda name: max(scores[name]), reverse=True)` in place of `sorted(scores)`. Note that both of these expressions returns a list of names (keys in `scores`). `dict` is an iterable. When you iterate over a dict, it iterates over the keys in the dict. The `sorted` function returns the items in the iterable passed to the function in some sorted order. So both calls to `sorted(scores, ...)` return a list of names. You just replace one for the other depending on how you wish to sort. – unutbu Apr 18 '15 at 21:22
How do you sort the average scores so it is sorted by highest to lowest in terms of the score and not alphabetically e.g. Mark 8, Matt 7, Adam 5, Zac 4.... is it reverse=true? print("The average scores were - sorted in decending order:") for name in sorted(scores): ave = sum(scores[name])/len(scores[name]) print('{name}: {a}'.format(name=name, a=ave)) – Canadian1010101 Apr 18 '15 at 21:36
I've figured it out: print("The average scores were:") for name in sorted(scores, key=lambda name: max(scores[name]), reverse=True): ave = sum(scores[name])/len(scores[name]) print('{name}: {a}'.format(name=name, a=ave)) – Canadian1010101 Apr 18 '15 at 21:40
To sort by average score (as opposed to alphabetically or by maximum score) you would use `sorted(scores, key=lambda name: sum(scores[name])/len(scores[name]), reverse=True)`. – unutbu Apr 18 '15 at 21:40
Thank you for your help today - I've learnt so much and your support and guidance has directed me to various sites to focus my training. – Canadian1010101 Apr 18 '15 at 21:40
Can you tell me how mine is different to yours? for name in sorted(scores, key=lambda name: max(scores[name]), reverse=True): ave = sum(scores[name])/len(scores[name]) print('{name}: {a}'.format(name=name, a=ave)) – Canadian1010101 Apr 18 '15 at 21:46
When you use `key=lambda name: max(scores[name])`, you are sorting the names according to the **maximum** score per name. When you use `key=lambda name: sum(scores[name])/len(scores[name])` you are sorting the names according to the **average** of the scores per name. – unutbu Apr 18 '15 at 21:56
The `key` parameter expects a function be passed to it. That function could be defined by a `def`-statement, or by using a [`lambda` function](http://www.diveintopython.net/power_of_introspection/lambda_functions.html). In the lambda functions above, the part after the colon, `max(scores[name])` or `sum(scores[name])/len(scores[name])` is the value returned by the lambda function given a `name` as its argument. `sorted` is calling the function internally, and sorting the names according to this proxy value. – unutbu Apr 18 '15 at 22:10
When I have a .txt file as Michael 4, Michael 3 in a list, one below the other it works fine. If however, the .txt file has the name and scores stored in one long line it won't work and shows an error about too many values to unpack. Is this because the line split reads them and can only look for two items for line? – Canadian1010101 Apr 30 '15 at 19:22

score 0 · Answer 3 · answered Apr 18 '15 at 10:58

A potential problem that I see from the code you've written is the potential in deleting students with the same score. Based on the code you've written here:

(name, score) = line.split()
scores[score]=name

Instead, of having a check to see if scores[score] is not null, you've essentially replaced the past name. I don't know what the results look like, so it may seem the case where all students have unique score.

Now to answer your question:

I am trying to get it to only print out the highest score per person which I believe is using MAX but I can't do it. [Why is that you ask?]

You want to print out the highest score per person stored in the dictionary, but how you store the data does not make this simple.

I suggest you have the name as a key and store the score(s) as a value, then when you iterate the dictionary, you can freely use Max to get the highest score.

As for the average, I advise you follow the suggestion I wrote above.

I hope this helps.

Sorting data in a dictionary alphabetically in python

3 Answers3