23

I have found the following piece of code that works pretty well for letting me view in Python Shell the standard 1% of the twitter firehose:

import sys
import tweepy

consumer_key=""
consumer_secret=""
access_key = ""
access_secret = "" 


auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)


class CustomStreamListener(tweepy.StreamListener):
    def on_status(self, status):
        print status.text

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream

sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(track=['manchester united'])

How do I add a filter to only parse tweets from a certain location? Ive seen people adding GPS to other twitter related Python code but I cant find anything specific to sapi within the Tweepy module.

Any ideas?

Thanks

gdogg371
  • 3,879
  • 14
  • 63
  • 107
  • i think my problem is a concatenation one. the synax for a filter by GPS would be 'sapi.filter(locations=[-122.75,36.8,-121.75,37.8])' however combining a keyword with a track filter does not seem to be working with the syntax i am using. – gdogg371 Apr 06 '14 at 02:39

4 Answers4

29

The streaming API doesn't allow to filter by location AND keyword simultaneously.

Bounding boxes do not act as filters for other filter parameters. For example track=twitter&locations=-122.75,36.8,-121.75,37.8 would match any tweets containing the term Twitter (even non-geo tweets) OR coming from the San Francisco area.

Source: https://dev.twitter.com/docs/streaming-apis/parameters#locations

What you can do is ask the streaming API for keyword or located tweets and then filter the resulting stream in your app by looking into each tweet.

If you modify the code as follows you will capture tweets in United Kingdom, then those tweets get filtered to only show those that contain "manchester united"

import sys
import tweepy

consumer_key=""
consumer_secret=""
access_key=""
access_secret=""

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)


class CustomStreamListener(tweepy.StreamListener):
    def on_status(self, status):
        if 'manchester united' in status.text.lower():
            print status.text

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream

sapi = tweepy.streaming.Stream(auth, CustomStreamListener())    
sapi.filter(locations=[-6.38,49.87,1.77,55.81])
Juan E.
  • 1,808
  • 16
  • 28
  • that explains why i am getting tweets about SF and United but not tweets from SF only about manchester united. have you managed to locate the syntax from the dev docs for you suggestion at all? thanks. – gdogg371 Apr 06 '14 at 03:06
  • If I change the sapi.filter to sapi.filter(track=['manchester united'], locations=[-122.75,36.8,-121.75,37.8]) I see tweets that mention MU all over the world and tweets from SF that not necessarily mention United. If you want tweets from SF that mention MU you have 2 alternatives: 1 - you can request tweets from SF and then check each tweet's text to see if it includes the substring "Manchester United" or 2 - you can request tweets with the keyword "Manchester United" and then see if they have been geolocated, and that the lat/lon are within the bounding box of SF. – Juan E. Apr 06 '14 at 04:06
  • any suggestions for syntax on that? – gdogg371 Apr 06 '14 at 12:00
  • I modified the answer to show you the first alternative. – Juan E. Apr 06 '14 at 14:44
  • 1
    thanks for that. i managed to find a similar way to code it but your method works just as well. in this example how can i print the geo location of the tweet? i have tried using 'print "location = ", sapi.filter(locations)' as the final line of the code. its not causing errors but it isnt creating any output either and im not sure why? – gdogg371 Apr 06 '14 at 15:34
  • Using "print status.coordinates" will show you the value. – Juan E. Apr 06 '14 at 15:45
  • I am stuck in tracking a user. It's easy to pass screen_name as a parameter in API method. But, how do i track a user? – Nazaf Anwar May 22 '16 at 15:04
  • The usefulness is limited because users have to opt into tracking tweets. or they can track their home. The location is given as a series of polygons, so it can be many places. So it might give the coordinates for one place yet the named location is complete different. – Walker Rowe Feb 04 '19 at 19:15
20

Juan gave the correct answer. I'm filtering for Germany only using this:

# Bounding boxes for geolocations
# Online-Tool to create boxes (c+p as raw CSV): http://boundingbox.klokantech.com/
GEOBOX_WORLD = [-180,-90,180,90]
GEOBOX_GERMANY = [5.0770049095, 47.2982950435, 15.0403900146, 54.9039819757]

stream.filter(locations=GEOBOX_GERMANY)

This is a pretty crude box that includes parts of some other countries. If you want a finer grain you can combine multiple boxes to fill out the location you need.

It should be noted though that you limit the number of tweets quite a bit if you filter by geotags. This is from roughly 5 million Tweets from my test database (the query should return the %age of tweets that actually contain a geolocation):

> db.tweets.find({coordinates:{$ne:null}}).count() / db.tweets.count()
0.016668392651547598

So only 1.67% of my sample of the 1% stream include a geotag. However there's other ways of figuring out a user's location: http://arxiv.org/ftp/arxiv/papers/1403/1403.2345.pdf

Kristian Rother
  • 321
  • 2
  • 2
0

You can't filter it while streaming but you could filter it at the output stage, if you were writing the tweets to a file.

Clovis
  • 183
  • 1
  • 8
-3

sapi.filter(track=['manchester united'],locations=['GPS Coordinates'])

gdogg371
  • 3,879
  • 14
  • 63
  • 107