Why is there a 'u' before every line of my output?

Question

Just wondering what the significance of the 'u' before every line of my code is and how I would be able to remove them? I'm working in python.

Last login: Mon Jul  1 09:58:27 on ttys000
Samuel-Finegolds-MacBook-Pro:~ samuelfinegold$ /var/folders/jv/9_sy0bn10mbdft1bk9t14qz40000gn/T/Cleanup\ At\ Startup/tutor-394379967.500.py.command ; exit;
{u'company': {u'address': {u'city': u'Chicago',
                           u'contactname': '',
                           u'geo': {u'latitude': u'41.92113',
                                    u'longitude': u'-87.70085'},
                           u'state': u'IL',
                           u'street_address': '',
                           u'zip': u'60647'},
              u'companyname': u'Wyzant',
              u'costtype': '',
              u'description': u'WyzAnt is the leading tutoring marketplace on the web with 67,000+ tutors offering private lessons in hundreds of subjects like math, science, test prep, foreign languages, music, computers and much more.',
              u'email': '',
              u'facebook': u'https://www.facebook.com/WyzAnt',
              u'image': '',
              u'language': '',
              u'linkedin': '',
              u'logo': '',
              u'phone': u'8779992681',
              u'program': {u'costrange': u'[]',
                           u'costtype': '',
                           u'programtype': ''},

Guessing you're calling `json.dumps`, look in this question: http://stackoverflow.com/questions/16261174/json-output-s-just-print-the-output-withou-u?rq=1 — maksimov, Jul 01 '13 at 14:15

Robert Kajic · Accepted Answer · 2013-07-01T14:26:51.623

u is used to create unicode strings:

>>> unicode_string = u'my unicode string'
>>> type(unicode_string)
<type 'unicode'>
>>> ascii_string = 'my ascii string'
>>> type(ascii_string)
<type 'str'>

You can convert a unicode string using str:

>>> converted_string = str(unicode_string)
>>> type(converted_string)

However, this is only possible if the characters in your unicode string can be represented using ascii:

>>> unicode_string = u'ö'
>>> converted_string = str(unicode_string)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 0: ordinal not in range(128)

You can read more about Python's unicode strings at http://docs.python.org/2/howto/unicode.html

Ashwini Chaudhary · Answer 2 · 2013-07-01T14:45:53.777

5

u means it's an unicode string, if the string contains only ASCII chacacters then there's no need of conversion to normal str as:

>>> "foo" == u"foo"
True

But you can't compare a unicode string with a byte string that contains non-ASCII characters:

>>> u'ö' == 'ö'
False
>>> 'ö'       #contains bytes
'\xc3\xb6'
>>> u'ö'      #contains sequence of code-points 
u'\xf6'

The comparison can be done only if you convert the byte string to unicode(with proper encoding):

>>> u'ö' == 'ö'.decode('utf-8')
True

Docs : Unicode HOWTO

Ned Batchelder's ppt: Pragmatic Unicode : How Do I Stop the Pain?

edited Jul 01 '13 at 14:45

answered Jul 01 '13 at 14:12

Ashwini Chaudhary

244,495
58
464
504

@downvoter care to explain your downvote so that I can improve my answer. – Ashwini Chaudhary Jul 01 '13 at 14:16
2

Didn't downvote you, but "won't affect your output in any way" is a bit of stretch don't you think? – rantanplan Jul 01 '13 at 14:19
@rantanplan Can you give me an example where my sentence could be wrong? – Ashwini Chaudhary Jul 01 '13 at 14:24
2

Do this: go to your terminal and create 2 variables holding the same string, in the non-ascii range(greek, russian, etc). In one string you will prepend the `u`, in the other you won't. Then compare them with an equality test. They seem the same... yet so totally different! – rantanplan Jul 01 '13 at 14:28
@rantanplan see my updated solution. – Ashwini Chaudhary Jul 01 '13 at 14:46
1

@rantanplan Thanks for the constructive criticism. :) – Ashwini Chaudhary Jul 01 '13 at 14:54

Markus Meskanen · Answer 3 · 2013-07-01T14:18:38.973

4

The lower case u in front of a string means it's an unicode string. It's only the encoding, and therefore is no harm at all. Unicode strings are able to represent wider variety of characters (such as £) than normal strings and the u won't be shown in prints:

>>> print(u'hi')
'hi'

You can learn more about unicode strings from python documents: http://docs.python.org/3/howto/unicode.html

edited Jul 01 '13 at 14:18

answered Jul 01 '13 at 14:12

Markus Meskanen

19,939
18
80
119

1

"It's only the encoding, and therefore is no harm at all". Are you serious people? What's with the "it's only encoding" meme in here? Encoding is the source of the *most* weird problems. Don't downplay it. Everyone should study about the differences between bytestings and unicode. – rantanplan Jul 01 '13 at 14:33

score 2 · Answer 4 · answered Jul 01 '13 at 14:19

2

To remove the unicode, use type casting.

    >>> x = u'abcd'
    >>> type(x)
    <type 'unicode'>
    >>> y = str(x)
    >>> type(y)
    <type 'str'>

answered Jul 01 '13 at 14:19

WiData

193
4
13

Why is there a 'u' before every line of my output?

4 Answers4

Linked