repr for (large) composite objects

Question

I would like to have informative representations for my composite objects (i.e., objects composed of other (potentially composite) objects). However, because my code fundamentally deals with high-precision numbers (please don't ask me why I don't just use doubles), I end up with representations like you see here: http://pastebin.com/jpLgAfxC. Would it just be better to just stick with the default __repr__?

It may sound somewhat obvious, but such things are a matter of how you want an object to represent itself. — Eli Korvigo, Mar 31 '15 at 18:16
Please do not paste your entire object here, put a part of it and use pastebin or some alternative for the entire object. — A.J. Uppal, Mar 31 '15 at 18:17
I try to go with an informative representation that can be `eval`ed — Alan Liddell, Mar 31 '15 at 18:22
I'm not sure I understand the question. "Would it just be better to just stick with the default `__repr__`?" If it looks good and satisfies your requirements, yes; otherwise, no. What more can we add to the matter? — Kevin, Mar 31 '15 at 18:23
The data model docs (https://docs.python.org/2/reference/datamodel.html#object.__repr__) pretty much sums up best practices around this. — Demian Brecht, Mar 31 '15 at 18:26
(Question is more opinion oriented than specific answer but...) In your case I would stick with the default `__repr__` because it's not useful to see those structs in your log. If you can make it accurate enough to feed back in to `__init__(self, ...)` then you may have a use for it. And then modify `__str__` to be something reasonable to use and see in the logs. Of course, uncaught error messages will still use `repr`. — aneroid, Mar 31 '15 at 18:35

Jonathan Eunice · Accepted Answer · 2015-03-31T21:23:31.293

Whether to have a verbose repr depends on what you want to accomplish. For complex or composite objects, I know which I'd prefer of the following:

Point(x=1.12, y=2.2, z=-1.9)
<__main__.Point object at 0x103011890>

They both tell me what type the object is, but only the first is clear about all of the (relevant) values involved, and avoids low-level information that is only relevant on the rarest of occasions.

I like to see the real values. But, yours is a special case, given that your values are so frightfully humongous:

72401317106217603290426741268390656010621951704689382948334809645
87850348552960901165648762842931879347325584704068956434195098288
38279057775096090002410493665682226331178331461681861612403032369
73237863637784679012984303024949059416189689048527978878840119376
5152408961823197987224502419157858495179687559851

That they cannot be useful for most development or debugging purposes. I'm sure there are times you need the full serialization--to send to and from files, for example. But those have to be fairly rare, no? I can't imagine you really remember all 309 digits, or can determine if the above number is the same as the one below on visual inspection:

72401317106217603290426741268390656010621951704689382948334809645
87850348552960901165648762842931879347325584704068956434195098288
38279057775096090002410493665682226331178331461681861612403032369
73327863637784679012984303024949059416189689048527978878840119376
5152408961823197987224502419157858495179687559851

They're not the same. But unless you're Spock or The Terminator, you wouldn't know that from a quick glance. (And actually, I've made it easier here, length-wrapping to avoid having to horizontally scroll.)

So I would recommend (massively) shortening their representation, to make the output more tractable. This is like printing out the entire chapter text every time you want to print a Chapter object. Overkill.

Instead, try something much shorter and easier to work with. Truncation and/or ellipsis are useful. e.g.

72401...59851
7240131710...

You can use the object id as well. If your high-precision type is HP, then:

HP(0x103011890)

At least then you will be able to tell them apart. One ugliness of using object ids, however, is that objects can be logically equivalent, but if you create multiple objects with the same logical value, they'd have different ids, thus appear different when they are not. You can get around that by creating your own short hash function. There's a bit of an art to hashing, but for reprs, even something simple would work. E.g.:

import binascii, struct

def shorthash(s):
    """
    Given a Python value, produce a short alphanumeric hash that
    helps identify it for debugging purposes. A riff on 
    http://stackoverflow.com/a/2511059/240490
    Enhanced to remove trailing boilerplate, and to work
    on either Python 2 or Python 3.
    """
    hashbytes = binascii.b2a_base64(struct.pack('l', hash(s)))
    return hashbytes.decode('utf-8').rstrip().rstrip("=")

Then define your repr in the high-precision class:

def __repr__(self):
    clsname = self.__class__.__name__
    return '{0}({1}).format(clsname, shorthash(self.value))

Where self.value is whatever local attribute, property, or method creates the multi-hundred-digit value. If you're subclassing int, this could be just self.

This gets you to:

HP(Tea+5MY0WwA)

The two massive, almost identical numbers above? Using this scheme, they render out to:

HP(XhkG0358Fx4)
HP(27CdIG5elhQ)

Which are obviously different. You can combine this with a bit of a value representation. E.g. a few alternatives:

HP(~7.24013e308 @ XhkG0358Fx4)
HP(dig='72401...59851', ndigits=309, hash='XhkG0358Fx4')

You'll find these shorter values more useful in debugging contexts. You can, of course, keep around a method or property (e.g. .value, .digits, or .alldigits) for those case in which you need every last bit, but define the common case as something more easily consumed.

This is exceptionally helpful. Thanks. – Alan Liddell Mar 31 '15 at 21:30 — Alan Liddell, Mar 31 '15 at 21:30

score 0 · Answer 2 · answered Mar 31 '15 at 18:40

0

Thank you to Demian for the pointer to https://docs.python.org/2/reference/datamodel.html#object.repr, specifically:

This is typically used for debugging, so it is important that the representation is information-rich and unambiguous.

http://pastebin.com/jpLgAfxC is probably the best possible __repr__ in this case.

answered Mar 31 '15 at 18:40

Alan Liddell

179
2
11

2

I can't agree that a 309-digit value is the best repr. Sure, it's information rich and unambiguous. So's the full Oxford English Dictionary. Large texts are inherently difficult for humans to immediately comprehend and "diff." For writing to permanent storage, yes print all the digits. For debugging, I recommend something shorter. – Jonathan Eunice Mar 31 '15 at 19:41

__repr__ for (large) composite objects

2 Answers2

repr for (large) composite objects