1

I have been working on a program to extract tweets from a Twitter account. It looks like this:

import tweepy
from tweepy import OAuthHandler
import json
import time
import sys
import builtins

consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth)

user = api.get_user('nytimes')

statuses = api.user_timeline(id = user.id, count = 200)

for status in statuses:
    print("***")
    print("Tweet id: " + status.id_str)
    print(status.text)
    print("Retweet count: " + str(status.retweet_count))
    print("Favorite count: " + str(status.favorite_count))
    print(status.created_at)
    print("Status place: " + str(status.place))
    print("Source: " + status.source)
    print("Coordinates: " + str(status.coordinates))

    time.sleep(1)

It works fine... until I get a tweet witrh an emoji. Then I get this error message:

UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 19-19: Non-BMP character not supported in Tk

Doing some research, I found a bit code that is supposed to go around this problem:

def print_ucs2(*args, print=builtins.print, **kwds):
    args2 = []
    for a in args:
        a = str(a)
    if max(a) > '\uffff':
        b = a.encode('utf-16le', 'surrogatepass')
        chars = [b[i:i+2].decode('utf-16le', 'surrogatepass')
    for i in range(0, len(b), 2)]
        a = ''.join(chars)
        args2.append(a)
        print(*args2, **kwds) 

builtins._print = builtins.print 
builtins.print = print_ucs2

The problem is, once I add this bit of code to my program, it ONLY prints emojis. Nothing else. I don't have the error message anymore... but I don't have the tweets either.

I've also read that something could be done with the .encode('utf-8'), but I'm not sure where to put it, so far I've only gotten error messages using this. Any ideas?

Thanks,

halfer
  • 19,824
  • 17
  • 99
  • 186
Pelo
  • 31
  • 1
  • 6
  • you may refer to this answer, https://stackoverflow.com/a/32442684/4662041 – Sheshnath Nov 22 '17 at 14:31
  • Hi, thanks I actually saw this answer but I don't know how to apply it to my code... I've tried a few things but without success, I get a "NameError: name 'x' is not defined" error. – Pelo Nov 23 '17 at 09:14
  • Oh nevermind, I managed to make it work! Thanks a lot! – Pelo Nov 23 '17 at 13:02

1 Answers1

0

(Posted on behalf of the question author).

Solved it! Here's the line to make it work:

non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)
...
print(status.text.translate(non_bmp_map))
halfer
  • 19,824
  • 17
  • 99
  • 186