-1

I'm learning Python at the moment and was puzzled about iterating speeds when cycling through dictionaries. In one of the tutorials we had to iterate through a dictionary and extract 'key' items for a hypothetical supermarket. I asked a question regarding best practice principles to iterate through a dictionary and was told that sorting a dictionary for iteration purposes doesn't really matter until you get to handling 'big' data sets so I shouldn't worry about it at all.

I wasn't certain why the tutor said it doesn't matter as I believe that speed is key to processing large data sets. I did some reading and found a very useful post (Python: List vs Dict for look up table) regarding this.

From this, can I assume that depending on the task, sorting of the dictionary is situational? Or would you say one should always sort a dictionary for optimal processing speeds?

To put this in more context - let's use the following example: Say we are searching for the price of a bunch of cashews in a dictionary which has 10,000 entries. In this case, if the entries were placed in a random manner in the dictionary - would the speed in searching for that entry be 'faster' if it were sorted, rather than randomly placed anywhere?

Thank you very much!

Community
  • 1
  • 1
IronKirby
  • 708
  • 1
  • 7
  • 24
  • 4
    Python dictionaries are implementations of hash functions. See https://en.wikipedia.org/wiki/Hash_table and http://stackoverflow.com/questions/114830/is-a-python-dictionary-an-example-of-a-hash-table – Alexander Oct 07 '15 at 04:28
  • dictionaries are unsorted collections ... however they have very fast item lookups (O(1)) – Joran Beasley Oct 07 '15 at 04:33
  • 1
    *Sorting* a dictionary? Why would that improve speed in any way whatsoever? – user2357112 Oct 07 '15 at 04:33
  • @user2357112: Sorry about this - I'm still new to Python so I may have used the wrong terminology. I'm just remembering what my Tutor had mentioned. Is sorting a dictionary redundant because it is a Hash_table so inherently it has fast look up times which won't be impacted by sorting..? – IronKirby Oct 07 '15 at 04:38
  • 1
    I don't know why you're expecting sorting to help anything. It's like putting your car parts in alphabetical order to make your car go faster. – user2357112 Oct 07 '15 at 04:40
  • @user2357112: That's a very helpful analogy! To add some more context - say we are searching for the price of a bunch of cashews in a dictionary which has 10,000 entries let's say. In this case, if the entries were placed in a random manner in the dictionary - would the speed in searching for that entry be 'faster' if it were sorted, rather than randomly placed anywhere? – IronKirby Oct 07 '15 at 04:44
  • 1
    Dictionary are unordered mappings, but if you want to have a sorted dictionary checkout `collections` module of python. It has OrderedDict -https://docs.python.org/2/library/collections.html#collections.OrderedDict – garg10may Oct 07 '15 at 04:45
  • @garg10may - Thanks for that! I'll take a look at that functionality! Would it be beneficial to always use OrderedDict or would you sometimes prefer having your data in a random manner? – IronKirby Oct 07 '15 at 04:46
  • @azurekirby definitely there would be some caveat, OrderedDict take up much more space etc. http://stackoverflow.com/questions/31197136/advantages-of-dict-over-ordereddict – garg10may Oct 07 '15 at 04:48
  • A dict naturally arranges its parts in a way optimized for fast lookup, in a way incompatible with sorting. If you reached into the dict and rearranged the entries, you would destroy it as surely as if you alphabetized your car. An OrderedDict is like a car with an ordered list of its parts and where they go; it does not physically arrange its parts in that order. (Also, the order of an OrderedDict doesn't have anything to do with sorted order.) – user2357112 Oct 07 '15 at 04:50
  • 1
    You don't need an `OrderedDict`, you need to understand that dictionaries are arbitrarily ordered. If you want to go through them in a certain order, you'll need to take the time to set up that order. If you just want to go through the dictionary, or look up an item, sorting doesn't even enter the picture. – TigerhawkT3 Oct 07 '15 at 04:53
  • Possible duplicate of [Why is the order in Python dictionaries and sets arbitrary?](http://stackoverflow.com/questions/15479928/why-is-the-order-in-python-dictionaries-and-sets-arbitrary) – TigerhawkT3 Oct 07 '15 at 04:56
  • Thanks a lot @TigerhawkT3. The link was very useful! – IronKirby Oct 07 '15 at 05:01
  • @azurekirby so the conclusion is either it's time to change your tutor or your teacher was taking about list. Becuase a dict anyhow can't be sorted. list can be sorted. Access time is improved once list is sorted. It won't matter if list is small. – garg10may Oct 07 '15 at 05:03
  • @garg10may: no he definitely was talking about Dictionaries.... Then again he's a 3rd year student so.....maybe he made a mistake? But regardless, thank you for your input! It was helpful! – IronKirby Oct 07 '15 at 05:05

2 Answers2

1

To put this in more context - let's use the following example: Say we are searching for the price of a bunch of cashews in a dictionary which has 10,000 entries. In this case, if the entries were placed in a random manner in the dictionary - would the speed in searching for that entry be 'faster' if it were sorted, rather than randomly placed anywhere?

Well... dictionaries already do have a sorting, since they are hashtables. The difference is that they're sorted by their hash rather than by the key itself. This means that once the hash has been calculated there is essentially nothing more that can be done to speed access up further. Gains can be found in the hash algorithm rather than in the items or structure themselves.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • Great - thanks for that! I was under the impression that everything, even dictionaries (which I now understand are hash-tables) could be optimised further..! I was reading this to further my understanding (http://cs.stackexchange.com/questions/249/when-is-hash-table-lookup-o1) - so would it be correct of me to say that, excluding those situations mentioned in the link, having an unsorted dictionary is perfectly fine as it won't impact querying speed? – IronKirby Oct 07 '15 at 04:53
  • Dictionaries are not "unsorted", they are simply sorted in a manner that the user cannot take advantage of. Hashtables can be optimized by tuning the hash algorithm to the key population. – Ignacio Vazquez-Abrams Oct 07 '15 at 05:03
1

To put this in more context - let's use the following example: Say we are searching for the price of a bunch of cashews in a dictionary which has 10,000 entries. In this case, if the entries were placed in a random manner in the dictionary - would the speed in searching for that entry be 'faster' if it were sorted, rather than randomly placed anywhere?

Its not really important how items are placed, its important how they are retrieved - because this is essentially how you are measuring the performance of the object.

Dictionaries employ a hash-table in order to retrieve items by key. This means that it doesn't matter what order the items are stored, because the retrieval speed/method/function does not rely on the order of insertion.

In other words, when you have a dictionary d and you do an operation such as:

print(d[some_key])

The retrieval of the value of some_key is not dependent of the order it was inserted into the dictionary. It would be retrieved with the same work efficiency had it been the first, second or last item inserted into the dictionary.

Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284
  • Thanks Burhan - that's very interesting. Could you elaborate a bit on what you mean when you say how are they retrieved? So far I've just learnt to use a for-loop to iterate through the list and spit out the values I'm looking for – IronKirby Oct 07 '15 at 04:55
  • What I mean is, when you request an item from a dictionary by referencing a key. See the update. – Burhan Khalid Oct 07 '15 at 04:58
  • Great Thanks Burhan! That helped me understand it – IronKirby Oct 07 '15 at 05:00
  • @BurhanKhalid retrieval can matter on how item are placed. For list if it's sorted retrieval time would be better. I think your argument is valid for mapping which you haven't explicitly mentioned. – garg10may Oct 07 '15 at 05:09
  • @garg10may the question is specifically about dictionaries, not lists. – Burhan Khalid Oct 07 '15 at 05:10