1

I'm using tweepy to collect random tweets, and I want to filter out non-alphanumeric tweets.

But in order to do the checking, I first need to convert the tweets to string. For example,

from tweepy import StreamListener
....

class sListener(StreamListener):
       def on_status(self,status):
            ....
            text = str(status.text)
            if not isAlphanumeric(text):
                ......

However, using str() to convert the tweet to string itself results in an error if the tweet is non-ascii with the following message:

UnicodeEncodeError: 'ascii' codec can't encode character

So I'm stuck in a loop where I need to convert to string to filter non-ascii, but I can't convert to string because of non-ascii....

I don't even know what data type tweets are...

Could anyone please help me out?

CosmicRabbitMediaInc
  • 1,165
  • 4
  • 21
  • 32

3 Answers3

0

It seems your tweets encoding is not ascii

Try

text = unicode(status.text)

instead of

text = str(status.text)
Abhijit
  • 62,056
  • 18
  • 131
  • 204
0

try

text = status.text.encode('utf8')
0

I had a similar issue in the past. See if this works:

tweetText = status.text.encode("utf-8")
tweetText = unicode(tweetText, errors='ignore')
user2789945
  • 527
  • 2
  • 6
  • 23