Most efficient way to Twitter Stream?

Question

My partner and I started learning Python at the beginning of the year. I am at the point where a) my partner and I are almost finished with our code, but b) are pulling our hair out trying to get it to work.

Assignment: Pull 250 tweets based on a certain topic, geocode location of tweets, analyze based on sentiment, then display them on a web-map. We have accomplished almost all of that except the 250 tweets requirement.

And I do not know how to pull the tweets more efficiently. The code works, but it writes around seven-twelve rows of information onto a CSV before it times out.

I tried setting a tracking parameter, but received this error: TypeError: 'NoneType' object is not subscriptable'

I tried expanding the locations parameter to stream.filter(locations=[-180,-90,180,90]), but received the same problem: TypeError: 'NoneType' object has no attribute 'latitude'

I really do not know what I am missing and I was wondering if anyone has any ideas.

CODE BELOW:

from geopy import geocoders
from geopy.exc import GeocoderTimedOut
import tweepy
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
from textblob import TextBlob
import json
import csv

def geo(location):
    g = geocoders.Nominatim(user_agent='USER')
    if location is not None:
        loc = g.geocode(location, timeout=None)
        if loc.latitude and loc.longitude is not None:
            return loc.latitude, loc.longitude

def WriteCSV(user, text, sentiment, lat, long):
    f = open('D:/PATHWAY/TO/tweets.csv', 'a', encoding="utf-8")
    write = csv.writer(f)
    write.writerow([user, text, sentiment, lat, long])
    f.close()

CK = ''
CS = ''
AK = ''
AS = ''

auth = tweepy.OAuthHandler(CK, CS)
auth.set_access_token(AK, AS)

#By setting these values to true, our code will automatically wait as it hits its limits
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

#Now I'm going to set up a stream listener
#https://stackoverflow.com/questions/20863486/tweepy-streaming-stop-collecting-tweets-at-x-amount
#https://wafawaheedas.gitbooks.io/twitter-sentiment-analysis-visualization-tutorial/sentiment-analysis-using-textblob.html        
class StdOutListener(tweepy.StreamListener):
    def __init__(self, api=None):
        super(StdOutListener, self).__init__()
        self.num_tweets = 0

    def on_data(self, data):
        Data = json.loads(data)
        Author = Data['user']['screen_name']
        Text = Data['text']
        Tweet = TextBlob(Data["text"])
        Sentiment = Tweet.sentiment.polarity
        x,y = geo(Data['place']['full_name'])
        if "coronavirus" in Text:
            WriteCSV(Author, Text, Sentiment, x,y)
            self.num_tweets += 1
            if self.num_tweets < 50:
                return True
            else:
                return False

stream = tweepy.Stream(auth=api.auth, listener=StdOutListener())
stream.filter(locations=[-122.441, 47.255, -122.329, 47.603])

is the indentation after the first import a mistake? imports should not be indented — geher, Mar 02 '20 at 09:37

Tin Nguyen · Answer 1 · 2020-03-02T11:50:59.117

The Twitter and Geolocation API returns all kinds of data. Some of the fields may be missing.

TypeError: 'NoneType' object has no attribute 'latitude'

This error comes from here:

loc = g.geocode(location, timeout=None)
if loc.latitude and loc.longitude is not None:
  return loc.latitude, loc.longitude

You provide a location and it searches for such location but it cannot find that location. So it writes into loc None.
Consequently loc.latitude won't work because loc is None.

You should check loc first before accessing any of its attributes.

x,y = geo(Data['place']['full_name'])

I know you are filtering tweets by location and consequently your Twitter Status object should have Data['place']['full_name']. But this is not always the case. You should check if the key really do exist before accessing the values.
This applies generally and should be applied to your whole code. Write robust code. You will have a bit of easier time debugging mistakes if you implement some try catch and print out the objects to see how they are built. Maybe set a breakpoint in your catch and do some live inspection.

Most efficient way to Twitter Stream?

1 Answers1