0

I am trying to migrate a Python 2 dictionary to Python 3.

The dict ordering behaviour has of course changed:

  • Python 2: Unordered - but in practice appears to be deterministic/consistent between subsequent calls of code using the same key-value pairs inserted in a consistent order on the same machine etc. The number of possible reorderings is restricted (though that is computer science territory from my point-of-view).

  • Python 3.7+: Insertion order is guaranteed.

I am looping over the elements of the dictionary and adding the key-value pairs to another structure in a way that is a function of the dict values that is order-dependent. Specifically, I am removing the elements based on their geometric proximity to certain landmarks ('nearest neighbours'). If the ordering changes, the removal order changes and the outcome is not the same.

I am trying to reproduce Python 2's exact mangling of the insertion order. This can be thought of as a python function or 'behaviour'. So, how can I replicate a Python 2 dict ordering in Python 3.7+?

Caveats:

  1. Of course I can use an OrderedDict in Python 2 (or 3.7) and often do (and should have here). But for the purposes of this question please don't suggest it, or that not using one was a bad idea etc.

  2. I could reorder the keys of the original dictionary. Could this be a route to solving the conundrum? In general can Python 2 be "emulated" in Python 3?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
jtlz2
  • 7,700
  • 9
  • 64
  • 114
  • 1
    What are you trying to solve or is this purely for your curiosity? What situation is there where dicts now being ordered is an issue for a migration? – Iain Shelvington Nov 19 '21 at 08:29
  • Note dictionaries still aren't _semantically_ ordered in Python 3.7+ - equality comparisons ignore order, for example. This feels a lot like an [XY problem](https://meta.stackexchange.com/q/66377/248731), could you provide the context that's led you to ask this? – jonrsharpe Nov 19 '21 at 08:39
  • @IainShelvington My response to your questions was becoming too long so I am going to put in the question. – jtlz2 Nov 19 '21 at 08:39
  • 1
    "I could reorder the keys of the original dictionary." - as in, the dict in the Python 2 code? It seems very strange that you would be able to do that, but not able to use an OrderedDict. – user2357112 Nov 19 '21 at 08:42
  • 2
    "Python 2 dict ordering" makes no sense because there was no form of ordering [that you could rely on]. So it seems that you want some kind of **UN**ordered dictionary in Python 3+. Well, you could take your original dictionary and construct a new dictionary based on random samples (of keys) taken from the original. But why??? –  Nov 19 '21 at 08:47
  • We'd rather you _didn't_ appeal to curiosity, because that lacks the constraints and alternatives of an actual problem. Dictionary order isn't a behaviour pre-3.7 because _dictionaries aren't ordered_. Why does removal order matter - presumably the _outcome_ is the same? – jonrsharpe Nov 19 '21 at 08:52
  • 1
    I'm more baffled by the specific alternatives you do or don't consider a viable option, than I am by the task itself. – user2357112 Nov 19 '21 at 08:54
  • @jtlz2 are you saying that in Python2.7 two dictionaries with the same keys, in the same process, will iterate over their keys in the same order and that's behaviour you want to recreate? – Iain Shelvington Nov 19 '21 at 08:56
  • @IainShelvington On a given machine, version etc. - yes, consistent. – jtlz2 Nov 19 '21 at 08:57
  • @jonrsharpe Re: editing - How was I to address all these questions except in the question..? Isn't that perfectly reasonable behaviour or should I address them in the comments? I often see questions updated to address comments. Is quotation out of bounds too? I think your rollback makes the question more likely to be closed. Especially since the close votes are related to "clarity". – jtlz2 Nov 19 '21 at 08:59
  • Reproducing behavior that's intentionally left unspecified doesn't make sense to me. – bereal Nov 19 '21 at 09:01
  • "Reproducing behavior that's intentionally left unspecified doesn't make sense to me." I added more context in response to all of the comments above, but the edits were removed by @jonrsharpe – jtlz2 Nov 19 '21 at 09:02
  • @jtlz2 I think it's worth adding to the question that you want to have two dictionaries that you can iterate over in the same key order independent of the insertion order – Iain Shelvington Nov 19 '21 at 09:02
  • I mean, I can easily see why someone would want to recreate dict order they got in Python 2. "Oh no, I didn't realize these outputs had to be reproducible! (Or maybe it's legacy code someone else wrote.) Now I have to replicate the output of a program that was never designed for reproducible results. This is going to suck." It's not all that uncommon a story. It happens with RNGs too. – user2357112 Nov 19 '21 at 09:02
  • @user2357112supportsMonica So there's a practical application - exactly! Thank you. I am going to refrain from adding clarity to the question for now until I get some clarity on what I can or can't add to it. I don't want to get into a bun fight :( – jtlz2 Nov 19 '21 at 09:03
  • What's really confusing is that some of the things you've given as viable alternatives *would* change the original order, or would change even more than that. – user2357112 Nov 19 '21 at 09:03
  • I dodn't just roll back - I carefully _kept_ the new information. If future visitors want to see how the question evolved that's all in the edit history, but the overwhelming majority will only care about the current state, which should be a coherent question not a conversation. It also removed some duplication (you said _"(see above)"_ yourself!) – jonrsharpe Nov 19 '21 at 09:03
  • @jonrsharpe OK - then I will direct interested parties to the edit history :) Thanks for caring! – jtlz2 Nov 19 '21 at 09:09
  • @jtlz2 could you just create a class/method that iterated over a dict in sorted key order? This would give you a consistent order between your two dicts no matter the insertion order – Iain Shelvington Nov 19 '21 at 09:21

1 Answers1

2

I am trying to reproduce Python 2's exact mangling of the insertion order. This can be thought of as a python function or 'behaviour'. So, how can I replicate a Python 2 dict ordering in Python 3.7+?

You cant.

I could reorder the keys of the original dictionary. Could this be a route to solving the conundrum? In general can Python 2 be "emulated" in Python 3?

No. Not without reimplementing Python 2's hashmap anyway (or compiling the "old" dict under a new name).

The Python 2 ordering (and that of hashmaps in general) is a consequence of the hash function, the collision resolution algorithm and the resizing thresholds (the load factor):

  • a hashmap is fundamentally an array of buckets, without special handling the iteration order is simply that of iterating the buckets array and yielding the entries in the non-empty buckets
  • item is stored at buckets[hash(key) % len(buckets)], hence the implications of the hash algorithm and size of the buckets array
  • if two items have the same modular hash, the collision will be resolved, for open addressing there are lots of algorithms (all of which end up with putting one of the items in the "wrong" bucket, but the way they do and which one they move can be quite different)
  • different collision resolution algorithms have different sensitivities to the buckets array being full, some will start degrading at 70% while others will be fine at 90%, this thus informs the load factor limits, therefore when you resize, and thus the new locations (and order) of the items
  • furthermore different hash tables can have different growth factors, even if all other factors are identical different growth factors will lead to different sizes of buckets arrays, and thus different distribution (and order) once modulo is applied

Since Python 3.6, P3 has changed the very implementation of hashmaps to use one that's "naturally ordered"1, the only way to get the not-insertion-order would be to interate the dk_indices field of the hashmap then explicitly dereference the corresponding dk_entries item, and neither field is part of the API (at the C level to say nothing of the Python), and even if they were you would have to check the implementation details to see if the new hashmap uses the same hashing and growth thresholds. Which again doesn't matter, because the internal details are not accessible.

[1]: you can see pypy's explanation of the scheme for a primer, though CPython's implementation is different/

Masklinn
  • 34,759
  • 3
  • 38
  • 57
  • https://pyvideo.org/pycon-us-2010/the-mighty-dictionary-55.html and https://pyvideo.org/pycon-us-2017/modern-python-dictionaries-a-confluence-of-a-dozen-great-ideas.html might be useful here. – jonrsharpe Nov 19 '21 at 09:07
  • "Not without reimplementing Python 2's hashmap anyway." Is that not possible? I mean, it was done for Python 2. Is it written in Python, or not (I assume not)? If it's 'just' a function, with no randomness (or randomness that can be seeded), then why not define the (complicated function), and the seeds, and go from there. – jtlz2 Nov 19 '21 at 09:08
  • 2
    @jtlz2 no, its implementation is... implementation-dependent. In CPython, which is the "default" Python you're probably using, it's implemented in C. – jonrsharpe Nov 19 '21 at 09:09
  • 1
    To put it another way: it's not _randomly_ ordered, it's _arbitrarily_ ordered. Given the same "history" (of insertion, removal, etc.) and, post-3.3, the same [hash seed](https://stackoverflow.com/q/27522626/3001761), you get the same order. But the dictionary isn't a _semantically_ ordered data structure even post-3.7 - that's why `OrderedDict` is still a thing. – jonrsharpe Nov 19 '21 at 09:16
  • 2
    @jtlz2 of course it's possible, but it's a huge amount of work. And no it's not witten in Python, the old implementation is [4000 lines of C](https://github.com/python/cpython/blob/v3.5.10/Objects/dictobject.c) – Masklinn Nov 19 '21 at 09:16
  • @jonrsharpe Arbitrarily ordered - exactly. – jtlz2 Nov 19 '21 at 09:17
  • 1
    @jtlz2: The definition of the Python 2 language didn't specify the ordering of values added to dictionaries, and whatever it was merely an implementation detail. In Python 3.7 the preservation of insertion order was added to the language definition (regardless of the implementation). Because of the former what you want to do cannot be done — you shouldn't have written Py2 code that counted on it being a certain way in the first place. – martineau Nov 19 '21 at 09:30
  • @martineau "you shouldn't have written Py2 code that counted on it being a certain way in the first place." - but I did, unknowingly, and that led to the question. I have usually been rightly paranoid to the extent of using `OrderedDict`s rather than `dict`s - but this came from a JSON load and I did not specify anything else :( – jtlz2 Nov 19 '21 at 09:37
  • 2
    @martineau unwittingly depending on iteration ordering is rather easy in non-trivial software, I had to deal with that quite a bit when migrating an old codebase from Python 2 to Python 3 (what with Python 3's hash randomisation). Go doesn't randomise *every hashmap iteration* just because they find it hilarious. e.g. a common way to get bit is to simply sum floats stored in a hashmap, because it's floats the order of operations can influence the result. – Masklinn Nov 19 '21 at 09:39
  • @jtlz2: Using `OrderedDict`s was the "right thing" to do in Py2 to guarantee ordering — that's why the class was created. They still work in Python 3, although are no longer necessary for 3.7+. There was a way in Py2 to make to JSON decoding use `OrderedDict`s instead of regular `dict`s if order was important. See [Can I get JSON to load into an OrderedDict?](https://stackoverflow.com/questions/6921699/can-i-get-json-to-load-into-an-ordereddict) – martineau Nov 19 '21 at 09:51
  • @martineau Indeed - thanks. – jtlz2 Nov 19 '21 at 09:54