7

I want to concatenate a list of various Python objects into one string. The objects can be literally anything. I thought I could simply do this using the following code:

' '.join([str(x) for x in the_list])

but unfortunately that sometimes gives me a UnicodeEncodeError:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 80: ordinal not in range(128)

in this SO answer I found someone who says that I need to use .encode('utf-8'), so I changed my code to this:

' '.join([x.encode('utf-8') for x in the_list])

But if the objects are not strings or unicodes but for example ints I get an AttributeError: 'int' object has no attribute 'encode'. So this means I need to use some kind of if-statement to check what kind of type it is and how to convert it. But when should I use .encode('utf-8') and when should I use str()?

It would be even better if I could also do some kind of oneliner for this, but I wouldn't know how? Does anybody else know? All tips are welcome!

Community
  • 1
  • 1
kramer65
  • 50,427
  • 120
  • 308
  • 488
  • 1
    What do you want this conversion to do, aside from "produce a string"? Presumably, the result should be somehow representative of the original object, but how much does it matter exactly what string gets produced? – user2357112 Dec 17 '15 at 17:03
  • @user2357112 - Because it's mainly for logging purposes, it doesn't really matter too much how close it comes. – kramer65 Dec 18 '15 at 13:05
  • Then why not take your list and print that? – user2357112 Dec 18 '15 at 15:43

3 Answers3

6

Python 2.x use repr(). Python 3.x use repr() if you don't mind non-ASCII Unicode in the result, or ascii() if you do:

>>> a=1             # integer
>>> class X: pass
...
>>> x=X()           # class
>>> y='\u5000'      # Unicode string
>>> z=b'\xa0'       # non-ASCII byte string
>>> ' '.join(ascii(i) for i in (a,x,y,z))
"1 <__main__.X object at 0x0000000002974B38> '\\u5000' b'\\xa0'"

Example of differences between 2.X and 3.X repr(), and 3.X ascii():

>>> # Python 3
>>> s = 'pingüino' # Unicode string
>>> s
'pingüino'
>>> repr(s)
"'pingüino'"
>>> print(repr(s))
'pingüino'
>>> ascii(s)
"'ping\\xfcino'"
>>> print(ascii(s))
'ping\xfcino'    

>>> # Python 2
>>> s = u'pingüino'
>>> s
u'ping\xfcino'
>>> repr(s)
"u'ping\\xfcino'"
>>> print(repr(s))
u'ping\xfcino'
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • To clarify, from `repr()`'s docstring, it returns the canonical string representation of the object. So imagine whatever you print to the console, e.g. a class reference, list, or anything else, it will turn into a string. – Reti43 Dec 17 '15 at 17:00
  • Where do you get this `ascii()` function from? If I try `ascii('something')` I get `NameError: name 'ascii' is not defined`. I tried importing it and searching around for it, but I can't find any mention of such a function. Any more tips? – kramer65 Dec 18 '15 at 07:50
  • @kramer65, `ascii()` is Python 3.x only. It works like `repr()` on Python 2.x. `repr()` on Python 3.x will display printable non-ASCII when supported by the output encoding, making it easier for languages other than English to read the output. – Mark Tolonen Dec 18 '15 at 20:25
  • Thanks for the explanation. I'm so into Python 2.7, that I didn't even think of the possibility of Python 3. Thanks for pointing that out. Since most libraries are now ported I might consider starting with Python 3 with upcoming projects. – kramer65 Dec 21 '15 at 10:46
1

You can try joining with a unicode object instead..

u' '.join(unicode(x) for x in thelist)

Or what you had before will work fine in python3. Just be sure to:

  1. decode early
  2. unicode everywhere
  3. encode late

For more details see this talk

Chad S.
  • 6,252
  • 15
  • 25
  • That doesn't always work, for example `unicode('ü')` leads to a `UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)`. Any other idea? – kramer65 Dec 18 '15 at 13:24
  • I tried running a file which contains two lines: `# -*- coding: utf-8 -*-` and `print unicode('ü')`, but I still get the `UnicodeDecodeError`. – kramer65 Dec 18 '15 at 17:14
  • Sorry. You need to be using unicode objects. Either do `print u'ü'` or if you must use a string you will have to decode it with `print 'ü'.decode('utf8')` – Chad S. Dec 18 '15 at 17:41
0

You could try combining the ternary operator with your current one-liner. Also join works just fine with a generator, so I don't think you need to create the list. Something like

' '.join(x.encode('utf-8') if isinstance(x, basestring) else str(x)
         for x in the_list)
James J.
  • 216
  • 1
  • 5