10

I'm aware of the fact that for Python < 3, unicode encoding for the string 'Plants vs. Zombies䋢 2' is as below:

u"Plants vs. Zombies䋢 2".encode("utf-8")

What if I have an variable (say appName) instead of a string can I do it like this:

  appName = "Plants vs. Zombies䋢 2"
 u+appName.encode("utf-8")

For:

 appName = appName.encode('utf-8');


 'ascii' codec can't decode byte 0xe4 in position 18: ordinal not in range(128)
Siddharthan Asokan
  • 4,321
  • 11
  • 44
  • 80
  • 1
    Sure, if it has `.encode` method. – freakish Nov 25 '13 at 21:05
  • 1
    Why don't you just try and see what happens? – Aleksander Lidtke Nov 25 '13 at 21:06
  • If `appName` is a unicode string then you can just use `appName.encode()`. If that *doesn't* work you don't have a `unicode` string perhaps. – Martijn Pieters Nov 25 '13 at 21:06
  • @freakish: No, not with that `u` business. – BrenBarn Nov 25 '13 at 21:09
  • @BrenBarn Surely, if he defined `u` variable. – freakish Nov 25 '13 at 21:09
  • @freakish: I think you're misunderstanding the question, although that's not surprising since the question isn't well stated. My impression is he's trying to apply the `u` to a variable as he does to a literal string like `u"blah"`, which isn't possible. – BrenBarn Nov 25 '13 at 21:11
  • @BrenBarn I understand the question. I'm making fun of it. I vote to close it, because it should "demonstrate a minimal understanding of the problem being solved" and it does not. Obviously OP does not understand what `u` (in front of a string) means. – freakish Nov 25 '13 at 21:11
  • @BrenBarn: You are confusing the syntax to create a `unicode` value in source code with existing values. `u'...'` *creates* a `unicode` string. You don't need to use `u` for existing variables, it is just special syntax to distinguish a `unicode` string from a regular string. – Martijn Pieters Nov 25 '13 at 21:18
  • @BrenBarn: Just like you can create a `list` object with square brackets (`[1, 2, 3]`) or a dictionary with curly braces (`{'foo': 'bar'}`), `u'...'` is a literal object notation. – Martijn Pieters Nov 25 '13 at 21:20
  • Even stricter than that. [] is effectively an operator which constructs at run-time a list from multiple expressions, which could be anything. u prefix doesn't accept an expression, it forces a different interpretation of the lexical token at parse time. – greggo Nov 25 '13 at 21:27
  • @MartijnPieters: I'm not, but the questioner is. – BrenBarn Nov 26 '13 at 02:57
  • @Bren: oops, those were indeed directed at the OP, not you. – Martijn Pieters Nov 26 '13 at 08:24

3 Answers3

13

No. The u notation is only for string literals. Variables containing string data don't need the u, because the variable contains an object that is either a unicode string or a byte string. (I'm assuming here that appName contains string data; if it doesn't, it doesn't make sense to try to encode it. Convert it to a bytestring or unicode first.)

So your variable either contains a unicode string or a byte string. If it is a unicode string you can just do appName.encode("utf-8").

If it is a byte string then it is already encoded with some encoding. If it's already encoded as UTF-8, then it's already how you want it and you don't need to do anything. If it's in some other encoding and you want to get it into UTF-8, you can do appName.decode('the-existing-encoding').encode("utf-8").

Note that if you do what you show in your edited, question, the result might not be what you expect. You have:

appName = "Plants vs. Zombies䋢 2"

Without the u on the string literal, you have created a bytestring in some encoding, namely the encoding of your source file. If your source file isn't in UTF-8, then you're in the last situation I described above. There is no way to "just make a string unicode" after you have created it as non-unicode. When you create it as non-unicode, you are creating it in a particular encoding, and you have to know what encoding that is in order to decode it to unicode (so you can then encode it to another encoding if you want).

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
  • 1
    `Your variable either contains a unicode string or a byte string` Where is that stated? – freakish Nov 25 '13 at 21:07
  • @freakish: It doesn't say it, but it's implied by the way he's using `u` and by his example with a string literal. I clarified my answer to say I'm assuming `appName` contains string data. – BrenBarn Nov 25 '13 at 21:08
2

No. the u prefix modifies the meaning of a string constant (making it a unicode constant). It is not an operator (which could be applied to any expression).

greggo
  • 3,009
  • 2
  • 23
  • 22
0

I think you can try below line:

s = "Plants vs. Zombies䋢 2" unicode(s, errors='ignore').encode('ascii')

It can translate any string variable to unicode type, default is using 'ascii', then you can encode it with 'ascii' which will make the type become normal string type.

Update for Python 3:

s.decode('ascii', 'ignore').encode('ascii')

https://docs.python.org/2/howto/unicode.html

Best way to convert string to bytes in Python 3?

Yang_2333
  • 644
  • 8
  • 10