Python string prints as [u'String']

Question

This will surely be an easy one but it is really bugging me.

I have a script that reads in a webpage and uses Beautiful Soup to parse it. From the soup I extract all the links as my final goal is to print out the link.contents.

All of the text that I am parsing is ASCII. I know that Python treats strings as unicode, and I am sure this is very handy, just of no use in my wee script.

Every time I go to print out a variable that holds 'String' I get [u'String'] printed to the screen. Is there a simple way of getting this back into just ascii or should I write a regex to strip it?

possible duplicate of the much more clearly worded question (and answer): https://stackoverflow.com/q/2464959/1390788 — Terrabits, Jun 21 '20 at 23:09
Does this answer your question? [What's the u prefix in a Python string?](https://stackoverflow.com/questions/2464959/whats-the-u-prefix-in-a-python-string) — Terrabits, Jun 21 '20 at 23:10

oefe · Accepted Answer · 2009-03-01T11:40:01.250

130

[u'ABC'] would be a one-element list of unicode strings. Beautiful Soup always produces Unicode. So you need to convert the list to a single unicode string, and then convert that to ASCII.

I don't know exaxtly how you got the one-element lists; the contents member would be a list of strings and tags, which is apparently not what you have. Assuming that you really always get a list with a single element, and that your test is really only ASCII you would use this:

 soup[0].encode("ascii")

However, please double-check that your data is really ASCII. This is pretty rare. Much more likely it's latin-1 or utf-8.

 soup[0].encode("latin-1")


 soup[0].encode("utf-8")

Or you ask Beautiful Soup what the original encoding was and get it back in this encoding:

 soup[0].encode(soup.originalEncoding)

edited Mar 01 '09 at 11:40

answered Mar 01 '09 at 11:22

oefe

19,298
7
47
66

6

You actually don't have to do the encoding, because the OP is only seeing the string repr because thats how you see anything when you print a list. soup[0] will be enough to show the str instead of the repr, showing the contents of the string and not the quote and unicode modifier. – ironfroggy Mar 01 '09 at 13:36
2

You shouldn't encode the text represented as Unicode to bytes in most cases: you should print Unicode directly in Python: [`print(', '.join([u'ABC' , u'...']))`](http://stackoverflow.com/a/36891685/4279) – jfs Jun 12 '16 at 17:20

score 27 · Answer 2 · answered Mar 01 '09 at 11:40

27

You probably have a list containing one unicode string. The repr of this is [u'String'].

You can convert this to a list of byte strings using any variation of the following:

# Functional style.
print map(lambda x: x.encode('ascii'), my_list)

# List comprehension.
print [x.encode('ascii') for x in my_list]

# Interesting if my_list may be a tuple or a string.
print type(my_list)(x.encode('ascii') for x in my_list)

# What do I care about the brackets anyway?
print ', '.join(repr(x.encode('ascii')) for x in my_list)

# That's actually not a good way of doing it.
print ' '.join(repr(x).lstrip('u')[1:-1] for x in my_list)

answered Mar 01 '09 at 11:40

ddaa

52,890
7
50
59

1

Please, avoid such horrors as `repr(x).lstrip('u')[1:-1]`. Use something like: `print ", ".join(my_list)` instead, to format a list of Unicode strings. – jfs Apr 27 '16 at 13:46
2

The comment, it says: "That's actually not a good way of doing it". It's just here for the lolz! – ddaa Apr 27 '16 at 14:54

score 13 · Answer 3 · edited Oct 17 '16 at 16:26

13

import json, ast
r = {u'name': u'A', u'primary_key': 1}
ast.literal_eval(json.dumps(r))

will print

{'name': 'A', 'primary_key': 1}

edited Oct 17 '16 at 16:26

Anko - inactive in protest

6,036
5
35
47

answered Oct 17 '16 at 15:30

osmjit

381
3
10

3

this method looks pretty sweet to me, why no votes? any performance impact we should worry about? – jrich523 Jul 12 '17 at 23:58
Just using `import json` and then `print json.dumps(myVar)` did the trick for me, thanks! – ArendE Aug 31 '20 at 15:06
ast is not even needed in my case. json.dumps solved my case. Thanks – Gökhan Polat Mar 09 '21 at 10:56

score 10 · Answer 4 · answered Feb 09 '13 at 06:21

10

If accessing/printing single element lists (e.g., sequentially or filtered):

my_list = [u'String'] # sample element
my_list = [str(my_list[0])]

answered Feb 09 '13 at 06:21

gevang

4,994
25
33

1

you do a list comprehension: `my_list = [str(my_list[x]) for x in range(len(my_list))]` – gevang Jun 15 '16 at 16:40

waweru · Answer 5 · 2021-05-14T20:20:12.837

5

pass the output to str() function and it will remove the unicode output u''. also by printing the output it will remove the u'' tags from it.

edited May 14 '21 at 20:20

answered Apr 28 '13 at 11:14

waweru

1,024
14
16

score 4 · Answer 6 · answered Apr 27 '16 at 13:45

[u'String'] is a text representation of a list that contains a Unicode string on Python 2.

If you run print(some_list) then it is equivalent to
print'[%s]' % ', '.join(map(repr, some_list)) i.e., to create a text representation of a Python object with the type list, repr() function is called for each item.

Don't confuse a Python object and its text representation—repr('a') != 'a' and even the text representation of the text representation differs: repr(repr('a')) != repr('a').

repr(obj) returns a string that contains a printable representation of an object. Its purpose is to be an unambiguous representation of an object that can be useful for debugging, in a REPL. Often eval(repr(obj)) == obj.

To avoid calling repr(), you could print list items directly (if they are all Unicode strings) e.g.: print ",".join(some_list)—it prints a comma separated list of the strings: String

Do not encode a Unicode string to bytes using a hardcoded character encoding, print Unicode directly instead. Otherwise, the code may fail because the encoding can't represent all the characters e.g., if you try to use 'ascii' encoding with non-ascii characters. Or the code silently produces mojibake (corrupted data is passed further in a pipeline) if the environment uses an encoding that is incompatible with the hardcoded encoding.

score 4 · Answer 7 · edited Sep 18 '14 at 00:52

4

Do you really mean u'String'?

In any event, can't you just do str(string) to get a string rather than a unicode-string? (This should be different for Python 3, for which all strings are unicode.)

edited Sep 18 '14 at 00:52

hichris123

10,145
15
56
70

answered Mar 01 '09 at 11:01

Andrew Jaffe

26,554
4
50
59

I should have been clearer. I am using str() but still getting output like below when I print. [u'ABC'] [u'DEF'] [u'GHI'] [u'JKL'] The data is stripped as text from a webpage, then inserted into a database (Google Appstore), then retrieved and printed. – gnuchu Mar 01 '09 at 11:09

score 3 · Answer 8 · answered Mar 01 '09 at 11:14

3

Use dir or type on the 'string' to find out what it is. I suspect that it's one of BeautifulSoup's tag objects, that prints like a string, but really isn't one. Otherwise, its inside a list and you need to convert each string separately.

In any case, why are you objecting to using Unicode? Any specific reason?

answered Mar 01 '09 at 11:14

sykora

96,888
11
64
71

I've been looking at BeautifulSoup since the last few days. I couldn't figure out how gnuchu would get u['string'] not [u'String']. His comment to Andrew Jaffe seems to prove it is a list. – batbrat Mar 01 '09 at 11:54

score -3 · Answer 9 · edited Apr 11 '15 at 23:43

-3

encode("latin-1") helped me in my case:

facultyname[0].encode("latin-1")

edited Apr 11 '15 at 23:43

Undo

25,519
37
106
129

answered Apr 11 '15 at 23:30

user1519904

7
3

Python string prints as [u'String']

9 Answers9

Linked

Related