0

I've learned how to send tweets with Python, but I'm wondering if it's possible to send emojis or other special Unicode characters in the tweets.

For example, when I try to tweet u'1F430', it simply shows up as "1F430" in the tweet.

Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
codycrossley
  • 571
  • 1
  • 6
  • 17
  • '1F430' is still a series of five alphanumeric characters whether you mark it as unicode or not. What character are you actually trying to send? – Daniel Roseman Aug 12 '15 at 14:34
  • 2
    you probably mean `'\U0001F430'` ()? – mata Aug 12 '15 at 14:36
  • That was just an example, but that '1F430' should be a bunny emoji. How do I get a computer to read that as one character then? – codycrossley Aug 12 '15 at 14:37
  • @mata, yes! How should I pass that into Python so that it reads it how I want it to? EDIT: Nevermind, your answer actually answers that. Thank you so much! – codycrossley Aug 12 '15 at 14:37
  • @codycrossley do you use python2 or python3? there are a lot of differences regarding unicode handling between those versions, and there are different [possible escape sequences](https://docs.python.org/3/howto/unicode.html#unicode-literals-in-python-source-code), which can be used depending on the needed byte size for the unicode code point... – mata Aug 12 '15 at 14:48
  • @mata, I generally use python2, but will eventually make the switch to python3. Thank you for the reference! – codycrossley Aug 12 '15 at 15:09

2 Answers2

2
>>> len(u'1f430')
5
>>> len(u'\U0001F430') 
1 # the latter might be equal to two in Python 2 on a narrow build (Windows, OS X)

The former is 5 characters, the latter is a single character.

If you want to specify the character in Python source code then you could use its name for readability:

>>> print(u"\N{RABBIT FACE}")

Note: it might not work in Windows console. To display non-BMP Unicode characters there, you could use win-unicode-console + ConEmu.

If you are reading it from a file, network, etc then this character is no different from any other: to decode bytes into Unicode text, you should specify a character encoding e.g.:

import io

with io.open('filename', encoding='utf-8') as file:
    text = file.read()

Which specific encoding to use depends on the source e.g., see A good way to get the charset/encoding of an HTTP response in Python

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
1

u'1F430' is the literal string "1F430". What character are you trying to get? In general you can get literal bytes into a python string using "\x20", e.g.

>>> print(b"#\x20#")
# #

The byte with hexadecimal value of 20 (decimal 32) in between 2 hashes. Bytes are decoded as ASCII by default, and ASCII char (hex) 20 is a space.

>>> print(u"#\u0020#")
# #
>>> print(u"#\U0001F430#")
# #

Unicode codepoint 20 (a single space) in the middle of 2 hashes

See https://docs.python.org/3.3/howto/unicode.html for more info. NB It can get a little confusing since python will implicitly convert between bytes and unicode (using the ASCII encoding) in a lot of cases, which can hide the issue from you for a while.

Tom Dalton
  • 6,122
  • 24
  • 35
  • for this code point a 4-byte escape sequence isn't enough, you need a 8-byte (`\Uxxxxxxxx`). Also, if you use python2 syntax you shouldn't link to the documentation for python3 as that can be confusing for the readers. – mata Aug 12 '15 at 14:53
  • don't print text as bytes. Which encoding is used to decode bytes depends on context. – jfs Aug 13 '15 at 18:56