3

I work with a number of json-like dicts. pprint is handy for structuring them. Is there a way to cause all ints in a pprint output to be printed in hex rather than decimal?

For example, rather than:

{66: 'far',
 99: 'Bottles of the beer on the wall',
 '12': 4277009102,
 'boo': 21,
 'pprint': [16, 32, 48, 64, 80, 96, 112, 128]}

I'd rather see:

{0x42: 'far',
 0x63: 'Bottles of the beer on the wall',
 '12': 0xFEEDFACE,
 'boo': 0x15,
 'pprint': [0x10, 0x20, 0x30, 0x40, 0x50, 0x60, 0x70, 0x80]}

I have tried customizing PrettyPrinter, but to no avail, was I able to cause the above, having PrettyPrinter.format() handle integers only seems to work for some of the integers:

class MyPrettyPrinter(PrettyPrinter):
    def format(self, object, context, maxlevels, level):
        if isinstance(object, int):
            return '0x{:X}'.format(object), True, False
        return super().format(object, context, maxlevels, level)

the above class produces

{0x42: 'far',
 0x63: 'Bottles of the beer on the wall',
 '12': 0xFEEDFACE,
 'boo': 0x15,
 'pprint': [16, 32, 48, 64, 80, 96, 112, 128]}

The list contents are not correctly formatted.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Travis Griggs
  • 21,522
  • 19
  • 91
  • 167
  • I give up. Take it from me: Don't try to customize the `pprint` module. It's _much_ easier and cleaner to reimplement the whole damn thing. They should rewrite the docs to remove any mention of customization. – Aran-Fey Mar 30 '18 at 10:13
  • @Aran-Fey: I rewrote just `_safe_repr`. I'll probably end up suggesting a refactor of the module to Python based on my work below, because using `singledispatch` would make `pprint` much more extensible out of the box. I imagine third-party libraries could register their types with `pprint` by default with this, and for the OPs usecase to be trivial (*just register your type-annotated handler directly with the module* style instructions). – Martijn Pieters Mar 30 '18 at 12:21
  • @MartijnPieters You can just wrap `_safe_repr` to handle your case and then fall the original instead of rewriting the whole thing, which isn’t _quite_ as nasty, but it’s still ugly. But the bigger problem is that you’re hooking all instances this way instead of just the one you create, which defeats the purpose of having a class. If you refactor things, you’d want to make it so you can register types with the instance (although being able to also register with the module would be nice, since nobody actually uses separate instances in practice...). – abarnert Mar 30 '18 at 14:46
  • @abarnert: I rather not set a global policy here (which is what monkeypatching `_safe_repr()` would do). If I refactor, you probably pass in an alternative `saferepr()` function and are given the opportunity to reuse the standard one to extend, or register with the one used by the module. – Martijn Pieters Mar 30 '18 at 14:49
  • @MartijnPieters Yes, that’s what I meant by defeating the purpose of having a class. Your answer here gets around that by basically dedicating a separate module to the subclass so it can still do the equivalent of a global call to `_safe_repr`, but you’d really want the dispatcher to be saved with the instance so that it’s easy to allow people to hook it at whichever level they want. – abarnert Mar 30 '18 at 14:58
  • Perhaps integrating the dispatch into the class (deprecating saferepr) is better. That’s let you handle both modes (one-line and multi-line handling) with a single dispatcher per type, if that can be seen as practical. – Martijn Pieters Mar 30 '18 at 15:05

2 Answers2

6

You can alter the output of pprint, but you need to re-implement the saferepr() function, not just subclass the pprint.PrettyPrinter() class.

What happens is that (an internal version of) the saferepr() function is used for all objects, and that function itself then recursively handles turning objects into representations (using only itself, not the PrettyPrinter() instance), so any customisation has to happen there. Only when the result of saferepr() becomes too large (too wide for the configured width) will the PrettyPrinter class start breaking up container output into components to put on separate lines; the process of calling saferepr() is then repeated for the component elements.

So PrettyPrinter.format() is only responsible for handling the top-level object, and every recursive object that is a) inside a supported container type (dict, list, tuple, string and the standard library subclasses of these) and b) where the representation of the parent container produced by .format() exceeded the display width.

To be able to override the implementation, we need to understand how the .format() method and the saferepr() implementation interact, what arguments they take and what they need to return.

PrettyPrinter.format() is passed additional arguments, context, maxlevels and level:

  • context is used to detect recursion (the default implementation returns the result of _recursion(object) if id(object) in context is true.
  • when maxlevels is set and level >= maxlevels is true, the default implementation returns ... as the contents of a container.

The method is also supposed to return a tuple of 3 values; the representation string and two flags. You can safely ignore the meaning of those flags, they are actually never used in the current implementation. They are meant to signal if the produced representation is 'readable' (uses Python syntax that can be passed to eval()) or was recursive (the object contained circular references). But the PrettyPrinter.isreadable() and PrettyPrinter.isrecursive() methodsactually completely bypass .format(); these return values seem to be a hold-over from a refactoring that broke the relationship between .format() and those two methods. So just return a representation string and whatever two boolean values you like.

.format() really just delegates to an internal implementation of saferepr() that then does several things

  • handle recursion detection with context, and depth handling for maxlevels and level
  • recurse over dictionaries, lists and tuples (and their subclasses, as long as their __repr__ method is still the default implementation)
  • for dictionaries, sort the key-value pairs. This is trickier than it appears in Python 3, but this is solved with a custom _safe_tuple sorting key that approximates Python 2's sort everything behaviour. We can re-use this.

To implement a recursive replacement, I prefer to use @functools.singledispatch() to delegate handling of different types. Ignoring custom __repr__ methods, handling depth issues, recursion, and empty objects, can also be handled by a decorator:

import pprint
from pprint import PrettyPrinter
from functools import singledispatch, wraps
from typing import get_type_hints

def common_container_checks(f):
    type_ = get_type_hints(f)['object']
    base_impl = type_.__repr__
    empty_repr = repr(type_())   # {}, [], ()
    too_deep_repr = f'{empty_repr[0]}...{empty_repr[-1]}'  # {...}, [...], (...)
    @wraps(f)
    def wrapper(object, context, maxlevels, level):
        if type(object).__repr__ is not base_impl:  # subclassed repr
            return repr(object)
        if not object:                              # empty, short-circuit
            return empty_repr
        if maxlevels and level >= maxlevels:        # exceeding the max depth
            return too_deep_repr
        oid = id(object)
        if oid in context:                          # self-reference
            return pprint._recursion(object)
        context[oid] = 1
        result = f(object, context, maxlevels, level)
        del context[oid]
        return result
    return wrapper

@singledispatch
def saferepr(object, context, maxlevels, level):
    return repr(object)

@saferepr.register
def _handle_int(object: int, *args):
    # uppercase hexadecimal representation with 0x prefix
    return f'0x{object:X}'

@saferepr.register
@common_container_checks
def _handle_dict(object: dict, context, maxlevels, level):
    level += 1
    contents = [
        f'{saferepr(k, context, maxlevels, level)}: '
        f'{saferepr(v, context, maxlevels, level)}'
        for k, v in sorted(object.items(), key=pprint._safe_tuple)
    ]
    return f'{{{", ".join(contents)}}}'

@saferepr.register
@common_container_checks
def _handle_list(object: list, context, maxlevels, level):
    level += 1
    contents = [
        f'{saferepr(v, context, maxlevels, level)}'
        for v in object
    ]
    return f'[{", ".join(contents)}]'

@saferepr.register
@common_container_checks
def _handle_tuple(object: tuple, context, maxlevels, level):
    level += 1
    if len(object) == 1:
        return f'({saferepr(object[0], context, maxlevels, level)},)'
    contents = [
        f'{saferepr(v, context, maxlevels, level)}'
        for v in object
    ]
    return f'({", ".join(contents)})'

class HexIntPrettyPrinter(PrettyPrinter):
    def format(self, *args):
        # it doesn't matter what the boolean values are here
        return saferepr(*args), True, False

This hand-full can handle anything the base pprint implementation can, and it will produce hex integers in any supported container. Just create an instance of the HexIntPrettyPrinter() class and call .pprint() on that:

>>> sample = {66: 'far',
...  99: 'Bottles of the beer on the wall',
...  '12': 4277009102,
...  'boo': 21,
...  'pprint': [16, 32, 48, 64, 80, 96, 112, 128]}
>>> pprinter = HexIntPrettyPrinter()
>>> pprinter.pprint(sample)
{0x42: 'far',
 0x63: 'Bottles of the beer on the wall',
 '12': 0xFEEDFACE,
 'boo': 0x15,
 'pprint': [0x10, 0x20, 0x30, 0x40, 0x50, 0x60, 0x70, 0x80]}

Side note: if you are using Python 3.6 or older you'll have to replace the @saferepr.registration lines with @saferepr.registration(<type>) calls, where <type> duplicates the type annotation on the first argument of the registered function.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
0

Update

I have implemented my concept in code. This seems to work fairly well.

Just use pph for "pretty print hex" or "ppf" for "pretty print hex (format)" (returns results).

from pprint import PrettyPrinter
pp = PrettyPrinter(indent=4).pprint
pf = PrettyPrinter(indent=4).pformat
def pph(o):
    print(re.sub(r"((?:, +|: +|\( *|\[ *|\{ *)-?)(\d\d+)(?=[,)}\]])", lambda m: m.group(1) + hex(m.group(2)), pf(o)))
def pfh(o):
    return re.sub(r"((?:, +|: +|\( *|\[ *|\{ *)-?)(\d\d+)(?=[,)}\]])", lambda m: m.group(1) + hex(m.group(2)), pf(o))

Original Post

Wow, sounds really complex. Can I ask what's wrong with just doing something like this?

d = pprint.pformat(data)
print re.sub(r'(\b\d+)L', lambda x: "0x{:x}".format(int(x.group(1))), d)

It worked on my data, which was admittedly all long rather than int (providing the convenient L anchor), and there are no cases of quoted literal numbers -- but such could be easily dealt with with

re.split(r"('[^']+')", d)

I'll grant that it isn't a pretty solution, but given the alternative, at least it's not complicated either.

{'funcStartRanges': [],
 'noCodeRanges': [],
 'noOwnerRanges': [{'last': 0x140ce1332, 'length': 0x12, 'start': 0x140ce1321},
                   {'last': 0x140ce1332, 'length': 0x12, 'start': 0x140ce1321}],
 'otherOwnerRanges': [{'last': 0x140ce1332,
                       'length': 0x12,
                       'start': 0x140ce1321}],
 'weOwnItRanges': []}
Orwellophile
  • 13,235
  • 3
  • 69
  • 45