8

This question arises from this answer where one user uses d.keys() and d.values() separately to initialise a dataframe.

It's common knowledge that dictionaries in python versions under 3.6 are not ordered.

Consider a generic dictionary of the form:

d = {k1 : v1, k2 : v2, k3 : v3}

Where the keys k* are any hashable objects, and the values v* being any object. Of course, order cannot be guaranteed, but what about the order of d.keys() and d.values()?

Python 2.x

Both d.keys() and d.values() return lists. Say, .keys() returns d's keys in the order [k2, k1, k3]. Is it now always guaranteed that d.values() returns the same relative ordering as [v2, v1, v3]? Furthermore, does the ordering remain the same no matter how many times these functions are called?

Python 3.x (<3.6)

I'm not 100% sure, but I believe that .keys and .values do not guarantee any ordering at all here because they are set-like structures, thus having no order by definition and enabling you to perform set-like operations on them. But I'd still be interested to know if there is any sort of relative ordering between the two calls in this instance. I'm guessing not. I'd appreciate if someone could affirm or correct me.

user4020527
  • 1
  • 8
  • 20
cs95
  • 379,657
  • 97
  • 704
  • 746
  • 1
    In python 3.6+ (I think) there're ordered, the question is not whether you can use it, but whether you should (I'm in the "you should" camp). See also https://stackoverflow.com/a/39980744/1240268 – Andy Hayden Nov 08 '17 at 05:49
  • 1
    Looks like `keys()` and `values()` come out in the same order. In python 2 it's arbitrary (but consistent) - but in python 3 it seems to be in definition order. Either way, it isn't a good idea to rely on this... There's no reason to, with `items()`. Python 2 = https://repl.it/NwCu/0 Python 3 = https://repl.it/NwDC/0 – Shadow Nov 08 '17 at 05:51
  • 5
    @AndyHayden - the fact it's not guaranteed in the spec is by far the most overwhelming reason to not rely on it in my opinion :P They even say in the docs that it's an implementation detail that you shouldn't rely on... – Shadow Nov 08 '17 at 05:54
  • I'm reeling at the fact that a [highly upvoted answer](https://stackoverflow.com/a/23600844/4909087) has made some assumptions about ordering and nobody bat an eyelid. – cs95 Nov 08 '17 at 05:55
  • @AndyHayden Another thing I find interesting is that, when passing dictionaries to `pd.DataFrame`, you almost always get a dataframe with columns in the same order as the keys.... astounded how that's always happened, with little to no exception. – cs95 Nov 08 '17 at 05:56
  • 1
    @Shadow I'm with [Raymond Hettinger](https://twitter.com/raymondh/status/850102884972675072). I also use f-strings everywhere now so everyone is forced to upgrade if they want to run my code. – Andy Hayden Nov 08 '17 at 05:58
  • @cᴏʟᴅsᴘᴇᴇᴅ as pointed out, in python 2 it was deterministic, early python 3 random, now it's insertion/written order. It'll be in the spec for sure, the behavior is so useful for kwargs etc. etc. I've seen the reliance in deterministic ordering in python 2 break (early) python 3 code - not fun. – Andy Hayden Nov 08 '17 at 06:01
  • @cᴏʟᴅsᴘᴇᴇᴅ That said, it turns out it always works for python 2.7 and python 3.6+. Provided you don't do anything silly like modify the dict mid iteration. – Andy Hayden Nov 08 '17 at 06:11
  • @AndyHayden Thanks. I saw the [CPython implementation footnote here](https://docs.python.org/2/library/stdtypes.html#dict.items), so I guess it's just best to never rely on any ordering. – cs95 Nov 08 '17 at 06:24
  • Also the part after the footnote that mentions the items directly correspond with each other. – cs95 Nov 08 '17 at 06:25
  • 1
    @cᴏʟᴅsᴘᴇᴇᴅ and for python 3 also https://docs.python.org/3/library/stdtypes.html#dictionary-view-objects. Which is to say the answer you linked to isn't broken hmmm, it's still a little fishy - mainly as you want the .keys() and .values() calls to be close together. If there was somewhere to bet that it'd make it to the spec I would. – Andy Hayden Nov 08 '17 at 06:29
  • @cᴏʟᴅsᴘᴇᴇᴅ for what it's worth, I left a comment on [that answer](https://stackoverflow.com/a/23600844/3058609) pointing out this discussion and suggesting using `ks, vs = zip(*d.items())` instead. – Adam Smith Nov 08 '17 at 07:03
  • 1
    @AndyHayden For `**kwargs` it is already in the spec, [starting from version 3.6](https://www.python.org/dev/peps/pep-0468/#specification). Still would not rely on dicts in general having that property. – Ilja Everilä Nov 08 '17 at 07:03
  • It has **always** been the case that if you call `d.keys()` and `d.values()` in either order, with no intervening insertions or deletions, then corresponding items from those two sequences will be a correct (key, value) pair. – PM 2Ring Nov 08 '17 at 07:51

1 Answers1

15

The general rules:

  1. Before talking about what is guaranteed and what isn't, even if some ordering seems to be "guaranteed", it isn't. You should not rely on it. It is considered bad practice, and could lead to nasty bugs.
  2. d.keys(), d.values(), and d.items() all return the elements in a respective order. The order should be treated as arbitrary (no assumptions should be made about it). (docs)
  3. consecutive calls to d.keys(), d.values(), and d.items() are "stable", in the sense they are guaranteed to preserve the order of previous calls (assuming no insertion/deletion happens between the calls).
  4. Since CPython's V3.6, dict has been reimplemented, and it now preserves insertion order. This was not the goal of the change, but a side effect, and it is NOT part of the python spec, only a detail of the CPython implementation. See point #1 above: relying on this is bad practice and should not be done. Anyways, you should avoid writing CPython-specific code.
  5. In Python2, order is deterministic (i.e. creating a dict twice in the same way will result with the same order). In Python <3.6, it is no longer deterministic, so you can't rely on that either (I'm not sure if this non-determinism is part of the spec or just a CPython implementation detail).

EDIT: added point #5, thanks to @AndyHayden's comment.

shx2
  • 61,779
  • 13
  • 130
  • 153
  • Thanks, this reaffirms my suspicions. Do you have any sources you can draw from? – cs95 Nov 08 '17 at 06:01
  • Especially part 3) about the stable ordering is very useful information that I didn't know about. So, to summarise, whether there is relative ordering or not, I shouldn't assume there is. Am I right? – cs95 Nov 08 '17 at 06:04
  • For any future readers, see CPython implementation footnote [here](https://docs.python.org/2/library/stdtypes.html#dict.items), and the bit about guaranteed correspondence just below. – cs95 Nov 08 '17 at 06:29
  • @AndyHayden I don't have a reference, but when you think about it, it is a direct result of point #2. If calling `keys()` and then `values()` preservers order, and calling `values()` and then `keys()` preserves order, it follows that calling `keys()` and then `keys`() preserves order. It is easy to give a formal proof for that. – shx2 Nov 08 '17 at 06:59
  • 2
    The non-determinism mentioned in point #5 is due to the non-determinism of the Python 3 `hash` function, which by default uses a random value to seed the hashes of str, bytes and datetime objects. You can set the PYTHONHASHSEED environment variable to a (machine) integer to make it deterministic. This is briefly mentioned in the command-line help, although that help claims that you need to set PYTHONHASHSEED to 'random' to invoke the random behaviour, that hasn't been true for several versions. – PM 2Ring Nov 08 '17 at 07:39
  • Note that this phenomenon also affects the ordering of sets. FWIW, the hash of a machine integer is just the integer itself, so sets of such integers, and dicts that use such integers for the keys, are deterministic. – PM 2Ring Nov 08 '17 at 07:39
  • 1
    Point #5 is rather misleading. The non-determinism in Python <3.6 doesn't come from the dict but from the strings (how they hash themselves), assuming that's what Andy showed. And you can have similarly non-deterministically hashing keys in Python2 as well. A good new point would be that Python ≥3.7 guarantees insertion order, it's not just a CPython implementation details anymore. – Kelly Bundy Aug 25 '23 at 14:57