67

Can I extend syntax in python for dict comprehensions for other dicts, like the OrderedDict in collections module or my own types which inherit from dict?

Just rebinding the dict name obviously doesn't work, the {key: value} comprehension syntax still gives you a plain old dict for comprehensions and literals.

>>> from collections import OrderedDict
>>> olddict, dict = dict, OrderedDict
>>> {i: i*i for i in range(3)}.__class__
<type 'dict'>

So, if it's possible how would I go about doing that? It's OK if it only works in CPython. For syntax I guess I would try it with a O{k: v} prefix like we have on the r'various' u'string' b'objects'.

note: Of course we can use a generator expression instead, but I'm more interested seeing how hackable python is in terms of the grammar.

Charles
  • 50,943
  • 13
  • 104
  • 142
wim
  • 338,267
  • 99
  • 616
  • 750
  • By "can I extend syntax", do you mean by making a custom build of CPython or PyPy or something, or do you mean from within the language? – abarnert Jan 14 '14 at 00:02
  • That's precisely where I'm a bit unsure. Can cpython extensions be used for stuff this hairy, or are we getting into building-your-own-python territory here? – wim Jan 14 '14 at 00:04
  • 1
    You can extend the {} syntax for dictionary comprehensions by using the following code: http://stackoverflow.com/a/7880276/313113 and **You'd need to add a visit_DictComp() method to the DictDisplayTransformer class.** If you *really* need to ;) – Alex Bitek Oct 27 '14 at 09:07
  • 1
    Just move to Python3.6, where [dicts are ordered](https://stackoverflow.com/q/39980323/974555). – gerrit Jul 20 '17 at 19:18

3 Answers3

95

Sorry, not possible. Dict literals and dict comprehensions map to the built-in dict type, in a way that's hardcoded at the C level. That can't be overridden.

You can use this as an alternative, though:

OrderedDict((i, i * i) for i in range(3))

Addendum: as of Python 3.6, all Python dictionaries are ordered. As of 3.7, it's even part of the language spec. If you're using those versions of Python, no need for OrderedDict: the dict comprehension will Just Work (TM).

naught101
  • 18,687
  • 19
  • 90
  • 138
Max Noel
  • 8,810
  • 1
  • 27
  • 35
  • 1
    FYI, this syntax is a generator expression that is passing tuples to the `OrderedDict()` constructor. To demonstrate, `list( (i, i * i) for i in range(3) )` gives `[(0, 0), (1, 1), (2, 4)]` – wisbucky Nov 18 '21 at 05:24
32

There is no direct way to change Python's syntax from within the language. A dictionary comprehension (or plain display) is always going to create a dict, and there's nothing you can do about that. If you're using CPython, it's using special bytecodes that generate a dict directly, which ultimately call the PyDict API functions and/or the same underlying functions used by that API. If you're using PyPy, those bytecodes are instead implemented on top of an RPython dict object which in turn is implemented on top of a compiled-and-optimized Python dict. And so on.

There is an indirect way to do it, but you're not going to like it. If you read the docs on the import system, you'll see that it's the importer that searches for cached compiled code or calls the compiler, and the compiler that calls the parser, and so on. In Python 3.3+, almost everything in this chain either is written in pure Python, or has an alternate pure Python implementation, meaning you can fork the code and do your own thing. Which includes parsing source with your own PyParsing code that builds ASTs, or compiling a dict comprehension AST node into your own custom bytecode instead of the default, or post-processing the bytecode, or…

In many cases, an import hook is sufficient; if not, you can always write a custom finder and loader.

If you're not already using Python 3.3 or later, I'd strongly suggest migrating before playing with this stuff. In older versions, it's harder, and less well documented, and you'll ultimately be putting in 10x the effort to learn something that will be obsolete whenever you do migrate.

Anyway, if this approach sounds interesting to you, you might want to take a look at MacroPy. You could borrow some code from it—and, maybe more importantly, learn how some of these features (that have no good examples in the docs) are used.

Or, if you're willing to settle for something less cool, you can just use MacroPy to build an "odict comprehension macro" and use that. (Note that MacroPy currently only works in Python 2.7, not 3.x.) You can't quite get o{…}, but you can get, say, od[{…}], which isn't too bad. Download od.py, realmain.py, and main.py, and run python main.py to see it working. The key is this code, which takes a DictionaryComp AST, converts it to an equivalent GeneratorExpr on key-value Tuples, and wraps it in a Call to collections.OrderedDict:

def od(tree, **kw):
    pair = ast.Tuple(elts=[tree.key, tree.value])
    gx = ast.GeneratorExp(elt=pair, generators=tree.generators)
    odict = ast.Attribute(value=ast.Name(id='collections'), 
                          attr='OrderedDict')
    call = ast.Call(func=odict, args=[gx], keywords=[])
    return call

A different alternative is, of course, to modify the Python interpreter.

I would suggest dropping the O{…} syntax idea for your first go, and just making normal dict comprehensions compile to odicts. The good news is, you don't really need to change the grammar (which is beyond hairy…), just any one of:

  • the bytecodes that dictcomps compile to,
  • the way the interpreter runs those bytecodes, or
  • the implementation of the PyDict type

The bad news, while all of those are a lot easier than changing the grammar, none of them can be done from an extension module. (Well, you can do the first one by doing basically the same thing you'd do from pure Python… and you can do any of them by hooking the .so/.dll/.dylib to patch in your own functions, but that's the exact same work as hacking on Python plus the extra work of hooking at runtime.)

If you want to hack on CPython source, the code you want is in Python/compile.c, Python/ceval.c, and Objects/dictobject.c, and the dev guide tells you how to find everything you need. But you might want to consider hacking on PyPy source instead, since it's mostly written in (a subset of) Python rather than C.


As a side note, your attempt wouldn't have worked even if everything were done at the Python language level. olddict, dict = dict, OrderedDict creates a binding named dict in your module's globals, which shadows the name in builtins, but doesn't replace it. You can replace things in builtins (well, Python doesn't guarantee this, but there are implementation/version-specific things-that-happen-to-work for every implementation/version I've tried…), but what you did isn't the way to do it.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • I'm interested in getting involved with the Python C api. Is the C api for 3 substantially different from 2? (My day job is 2 and won't ever go to 3) –  Jan 14 '14 at 02:10
  • 2
    @EdgarAroutiounian: The C API is even more conservative than the language itself—`long` and `unicode` changed to `int` and `str`, but the C types are still `PyLong` and `PyUnicode`. Almost all of the differences are related to new functionality that didn't exist in 2.x. (If you dive into hacking on CPython itself, there are much bigger differences. But in most cases—with the notable exception of Unicode internal storage—3.4 is simpler than 2.7, so it still makes sense to learn the easy way first.) – abarnert Jan 14 '14 at 03:46
  • 1
    @EdgarAroutiounian: Anyway, the best way to get involved with the C API is to build a simple extension that wraps some C library and exposes it to Python in a nice way. The [Extending and Embedding](http://docs.python.org/3.3/extending/index.html) tutorial in the official docs is pretty good. You might want to try doing the same wrapper with `ctypes`/`cffi` and a native extension (and maybe Cython, too) to really understand how things look from the different sides. – abarnert Jan 14 '14 at 03:51
  • "There is an indirect way to do it, but you're not going to like it." -- I like it already :) – Inversus Aug 18 '14 at 10:50
  • Sir, this is one impressive answer. – Yonatan Jan 22 '17 at 19:00
16

Slightly modifying the response of @Max Noel, you can use list comprehension instead of a generator to create an OrderedDict in an ordered way (which of course is not possible using dict comprehension).

>>> OrderedDict([(i, i * i) for i in range(5)])
OrderedDict([(0, 0), 
             (1, 1), 
             (2, 4), 
             (3, 9), 
             (4, 16)])
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • This seems to give you the same end result as Max's answer. Does this have any benefits/difference over the other? – user694733 Sep 19 '16 at 08:47
  • 1
    @user694733 This shows that you can use `OrderedDict([(0, 2), (2, 5)])` using arbitrary values. – Quentin Pradet Sep 20 '16 at 05:29
  • 1
    @user694733 The OP's question ends with "note: Of course we can use a generator expression instead, but I'm more interested seeing how hackable python is in terms of the grammar." This solution accomplishes the same thing without a generator. – Alexander Sep 20 '16 at 17:00
  • I think 'generators' in python is a bit of advanced concept and not everyone using python need to know it. So it makes sense to use a list contribution if the target audience who will read/maintain the script are not experts with the language. – balki Sep 21 '16 at 17:01