1

What I would like

In this example, I would like to see the unicode string without using print:

In [1]: a = u's·A/m'

In [2]: type(a)
Out[2]: str

In [3]: a
Out[3]: 's\xc2\xb7A/m'

In [4]: print a
s·A/m

How to force string __repr__ not to display s\xc2\xb7A/m but s·A/m instead?

What is the use-case?

I have a class that represents a number in association with its units for example:

class MyNumber(float):
    def __new__(cls, ...): 
        ...

    def __repr__(self):
        return str(self) + str(self.units)

When I am working in IPython I would like to quickly see the content of instance:

>>> a = MyNumber('23.43', ampere=1, second=1, meter=-1)
>>> a
23.43 s·A/m

Instead I get an exception:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 11: ordinal not in range(128)

And If I try to manually see the content of my __repr__ I get this:

>>>a.__repr__()
23.43 s\xc2\xb7A/m 
nowox
  • 25,978
  • 39
  • 143
  • 293
  • @nowox What's `type(a.__repr__())` and where exactly does your exception occur (this is shown by the so-called traceback of the exception which waits to be edited into the question)? – glglgl Dec 04 '15 at 17:20
  • `type(a.__repr__())` is `str` – nowox Dec 04 '15 at 17:21
  • 2
    You've tagged the question `python-2.7` but you're IPython output says `type(a)` is `str` on line `[2]`. It should say ``. Then, `[3]` shows it as a UTF-8-encoded byte string. This doesn't look like it was cut-n-pasted from a real trace. – Mark Tolonen Dec 04 '15 at 18:17

3 Answers3

1

Instead of returning a byte string from __repr__, return a Unicode string.

def __repr__(self):
    return unicode(self) + self.units.decode('utf-8')

If self.units is already a Unicode string:

def __repr__(self):
    return unicode(self) + self.units
nowox
  • 25,978
  • 39
  • 143
  • 293
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • @nowox you can't `decode` a Unicode string, if you try Python will try to `encode` it first which is what generates the error. Did you even try the code I give here? – Mark Ransom Dec 04 '15 at 18:15
  • Sorry, your final conclusion is opposite of what I was saying. You are returning a byte string, and I said to return a Unicode string. Did you try it my way? – Mark Ransom Dec 04 '15 at 20:42
  • @nowox I mean neither! `decode` converts a byte string to a Unicode string, and `encode` converts a Unicode string to a byte string. Just stick to Unicode strings throughout and see what happens. – Mark Ransom Dec 04 '15 at 20:49
-1

Your problem may comes from your IPython configuration. Check the encoding:

>>> import sys
>>> sys.getdefaultencoding()
'ascii'

If you get ASCII as above, you may experience some issues with unicode strings.

So try this:

>>> reload(sys)
>>> sys.setdefaultencoding('utf8')

And it should works...

nowox
  • 25,978
  • 39
  • 143
  • 293
  • 1
    Setting resp. changing the default encoding may lead to severe issues with programs which expect the default one. See [here](http://stackoverflow.com/a/17628350/296974) which links to [here](https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/). – glglgl Dec 07 '15 at 06:26
  • Damned. I guess I am still confused about this encoding issue. If this is not the solution to my problem. What should I do [glglgl](http://stackoverflow.com/users/296974/glglgl). – nowox Dec 07 '15 at 07:32
  • 1
    The problem is, `sys.setdefaultencoding` sets the default program-wide and thus affects as well parts which are not aware of the change. Instead, every place where a encode or decode happens should be provided with the knowledge which encoding should be used. – glglgl Dec 07 '15 at 09:45
-2

It is a so-called XY problem. Your first question is completely irrelevant.

Instead, you should

  • either fix your self.units to be in the right format (if I am not mistaken, you use Python 3? You should announce your Python major version in the tags...)
  • or convert it to something like

    class MyNumber(float):
        def __repr__(self):
            return str(self) + " " + str(self.units)
    

This answer results of my guess that self.units is maybe not a str, but a unicode object. Then its implicit conversion could fail as per default, the ascii codec is used for decoding.

You should make sure not to mix up raw strings and unicode strings.

If str(self.units) doesn't work, you might want to replace it with self.units.decode("utf8").

To become clear about the process:

Typing a at the prompt will display the result of repr(a), somehow.

repr(a) calls a.__repr__(), checks its type (it must be str) and displays it.

I am not clear about why a or repr(a) fails while a.__repr__() works, that's why I keep asking about the exception's traceback...

glglgl
  • 89,107
  • 13
  • 149
  • 217
  • Doing `return "%s %s" % (str(self), self.units)` gives the same result – nowox Dec 04 '15 at 17:09
  • It's a bad idea to `str()` without giving an encoding. Any non ASCII chars will throw a UnicodeEncodeError – Alastair McCormack Dec 05 '15 at 21:02
  • @AlastairMcCormack This is true for `unicode` objects. But we still don't know where exactly the error happens and what exactly happens, as we still don't see the traceback of the exception, and we still don't know if `self.units` is `str` or `unicode`, so we'll have to try an iterative approach to the solution. That's what I try to make clear in the answer although some people seem to think there's something wrong with it... – glglgl Dec 07 '15 at 06:33