In Python3+, your original_tweet
string is a UTF-8 encoded Unicode string containing a Unicode emoji. Because the 65k+ characters in Unicode are a superset of the 256 ASCII characters, you can not simply convert a Unicode string into an ASCII string.
However, if you can live with some data loss (i.e. drop the emoji) then you can try the following (see this or this related question):
original_tweet = "I luv my <3 iphone & you’re awsm ..."
# Convert the original UTF8 encoded string into an array of bytes.
original_tweet_bytes = original_tweet.encode("utf-8")
# Decode that array of bytes into a string containing only ASCII characters;
# pass errors="strict" to find failing character mappings, and I also suggest
# to read up on the option errors="replace".
original_tweet_ascii = original_tweet_bytes.decode("ascii", errors="ignore")
Or as a simple one-liner:
tweet = original_tweet.encode("utf-8").decode("ascii", errors="ignore")
Note that this does not convert the HTML entities <
and &
which you may have to address separately. You can do that using a proper HTML parser (e.g. lxml), or use a simple string replacement:
tweet = tweet.replace("<", "<").replace("&", "&")
Or as of Python 3.4+ you can use html.unescape()
like so:
tweet = html.unescape(tweet)
See also this question on how to handle HTML entities in strings.
Addendum. The Unidecode package for Python seems to provide useful functionality for this, too, although in its current version it does not handle emojis.