1

I'm using Python 3.5.2 and I have a dict that contains as "keys" a Tuple of Strings, and as "values" an Integer from a count. I want to do a dual sort, where the first priority is the first string in the key, and the second priority is the integer value. See below for a more in depth explanation:

For example, I've got a Dict:

>>> print(unorderedDict.items())
dict_items([(('has', 'accomplished'), 1), (('new', 'french'), 1), (('pieces', 'machinery'), 1), (('in', 'those'), 1), (('east', 'on'), 1), (('sectarian', 'principles'), 1), ((',', 'are'), 10), (('all', 'countries'), 2)......])

It contains as the Key a Tuple of two Strings ex. ('has', 'accomplished') and also a value that is an integer ex. 1. Ex. all together: ([(('all', 'countries'), 2)]).

This essentially contains all unique combinations of words found in a text, in Tuple form as the key, along with the number of times that unique combination of words appears in the text as the value which is an Integer.

I want a way to sort the unorderedDict, 1st by the first string in the key tuple and 2nd by the value.

The purpose of this is so that I will have a list of words, plus the word most likely to follow it, and next in the list that same word plus the next most likely word to follow it in the text.

Example output:

dict_items([(('all', 'the'), 10), (('all', 'of'), 7), (('big', 'drums), 12), (('big', 'dogs') 6)......])

Notice how it is first sorted by the first string in the tuple (alphabetically), and second by the value (numerically highest to lowest).

What Python 3 code would I need in order to perform this type of sorting algorithm?

The main reason for needing this sorting algorithm is so that I can randomly select one of the first strings in the tuple and get the second string in the tuple that is found more often (identified by the Integer from Count).

For example, I could randomly select 'all' and see that it is more likely to be followed by 'the' than it is by 'of' (vount of 'the' = 10, vount of 'of' = 7).

From my own research I assume it will have something to do with built-in dict sort methods and lambdas perhaps, but this is new territory for me, so I don't really have a clue.

MSeifert
  • 145,886
  • 38
  • 333
  • 352
Ali
  • 51
  • 7
  • 1
    Dicts are unordered in Python 3.5. The language definition is shifting towards preserving dict order, but even as of 3.6, that's considered an implementation detail. – user2357112 Jan 30 '17 at 06:30
  • Would it be best to loop through the Dict and place it all in a List instead, in that case? And then sort the List? – Ali Jan 30 '17 at 06:33
  • 1
    @Ali. I think you have the wrong data structure. It should instead be: `{'all': {'the': 10, 'of': 7}, 'big': {'drums': 12, 'dogs': 6}, ...}`. Then you can look up a word at random, and sort the inner dict by its values: `sorted(data['all'].items(), key=lambda i: i[1])`. – ekhumoro Jan 30 '17 at 07:14
  • 1
    `sorted(unorderedDict.items(), key=lambda x: (x[0][0], x[1]))` – poke Jan 30 '17 at 07:54
  • 1
    [Sort a Python dictionary by value](http://stackoverflow.com/q/613183/216074) and [Sort a list by multiple attributes?](http://stackoverflow.com/q/4233476/216074) – Combine those two and you know how to solve this. – poke Jan 30 '17 at 07:58
  • 1
    @poke I think it's the wrong target because even though the question asks about sorting - the solution doesn't require any sorting! – MSeifert Jan 30 '17 at 17:51
  • 1
    @MSeifert Well, the question is maybe 15% about the actual underlying problem that can be solved otherwise. If we were to clear the question up to calrify the obvious [XY problem](http://meta.stackexchange.com/a/66378/141542), I don’t think the question would retain enough content to justify being open anyway (no example, no attempted solution etc). So I don’t feel like reopening it the way it is right now. Feel free to attempt to salvage it though and reopen it. – poke Jan 30 '17 at 19:33
  • Thank you for the comments. I'm new to asking questions on this forum, and it is really helpful to get information like this to help me ask better questions in the future and makes me realize I need to step out of the box and reconsider the structure of what I was trying to accomplish, as I was obviously going about it the wrong way from the start. Anyway, I appreciate the constructive criticism. Thank you! – Ali Feb 08 '17 at 01:44

1 Answers1

1

Basically this can be done with an OrderedDict:

from collections import OrderedDict
OrderedDict(sorted(unorderedDict.items(), key=lambda x: (x[0][0], x[1])))
#                                 first string of key----^^^^^^^  ^^^^---value

However I think you should consider using another data structure. For example an unordered dict of lists seems like a good option because you're only interested in the most common word following the first one:

import bisect
unorderedDict = dict([(('has', 'accomplished'), 1),  (('has', 'done'), 5), 
                      (('new', 'french'), 1), (('has', 'failed'), 3), 
                      (('pieces', 'machinery'), 1), (('in', 'those'), 1), 
                      (('east', 'on'), 1), (('sectarian', 'principles'), 1), 
                      ((',', 'are'), 10), (('all', 'countries'), 2)])
result = {}

for (key1, key2), counts in unorderedDict.items():
    if key1 not in result:
        # add a new key
        result[key1] = [(counts, key2)]
    else:
        # We want the lists to be sorted so we can use bisection to do this quite efficient
        bisect.insort_left(result[key1], (counts, key2))

>>> print(result)
{'sectarian': [(1, 'principles')], 
 'pieces': [(1, 'machinery')], 
 ',': [(10, 'are')], 
 'all': [(2, 'countries')], 
 'has': [(1, 'accomplished'), (3, 'failed'), (5, 'done')],   # sorted from low to high!
 'new': [(1, 'french')], 
 'in': [(1, 'those')], 
 'east': [(1, 'on')]}

The outer dictionary isn't ordered because I suspect that it doesn't need to be (and if it should be then I don't know how).


An alternative could be collections.Counter as inner-structure because it has a nice .most_common method:

from collections import Counter

result = {}

for (key1, key2), counts in unorderedDict.items():
    if key1 not in result:
        result[key1] = Counter()
    result[key1][key2] = counts

>>> result['has'].most_common()  # returns it sorted!!!
[('done', 5), ('failed', 3), ('accomplished', 1)]

>>> result['has'].most_common(1)
[('done', 5)]

>>> result['has']['failed']  # can be accessed like a dictionary too
3
MSeifert
  • 145,886
  • 38
  • 333
  • 352