Filtering Twitter data using Tweepy

Question

I've used Marco Bonzanini's tutorial on mining Twitter data : https://marcobonzanini.com/2015/03/02/mining-twitter-data-with-python-part-1/

class MyListener(StreamListener):

    def on_data(self, data):
        try:
            with open('python.json', 'a') as f:
                f.write(data)
                return True
        except BaseException as e:
            print(&quot;Error on_data: %s&quot; % str(e))
        return True

    def on_error(self, status):
        print(status)
        return True

and used the "follow" parameter of the filter method to retrieve the tweets produced by this specific ID :

twitter_stream = Stream(auth, MyListener())
twitter_stream.filter(follow=["63728193"#random Twitter ID])

However, it does not seem to fulfill the mission since it not only returns the tweets & retweets created by the ID, but also every tweet wherein the ID is mentioned (i.e. retweets). That is not what I want.

I'm sure there must be a way to do it since there is a "screen_name" field in the json file given by Twitter. That screen_name field gives the name of the creator of the Tweet. I just have to find how to filter the data on this screen_neame field.

score 2 · Accepted Answer · answered Feb 14 '17 at 16:46

2

This behaviour is by design. To quote the Twitter streaming API docs:

For each user specified, the stream will contain:

Tweets created by the user.

Tweets which are retweeted by the user.

Replies to any Tweet created by the user.

Retweets of any Tweet created by the user.

Manual replies, created without pressing a reply button (e.g. “@twitterapi I agree”).

The best way for you to process it for your purposes is to check who created the tweet as it is received, which I believe can be done as follows:

class MyListener(StreamListener):
    def on_data(self, data):
        try:
            if data._json['user']['id'] == "63728193":
                with open('python.json', 'a') as f:
                    f.write(data)
        except BaseException as e:
            print(&quot;Error on_data: %s&quot; % str(e))
        return True

    def on_error(self, status):
        print(status)
        return True

answered Feb 14 '17 at 16:46

asongtoruin

9,794
3
36
47

Not sure what your comment is getting at here? – asongtoruin Feb 15 '17 at 10:23
Your code is almost ok. However `data` is not a JSON object but a str object. You have to convert `data` from str to dict before doing a request on it : `class MyListener(StreamListener): def on_data(self, data): try: data = json.loads(data) if data['user']["screen_name"] == "lemondefr": print(data) except BaseException as e: print(str(e)) return True def on_error(self, status): print(status) return True` – Nahid O. Feb 15 '17 at 10:24
@NahidO. a `Status` object in Tweepy has a `._json` property, which should let you access the various properties of the tweet as described in my answer. I'd be interested to know if using `data._json['user']['screen_name']` didn't work for you – asongtoruin Feb 15 '17 at 10:31
I don't know why it didn't work. Maybe a `data`object doesn't work like a `Status` object ? I don't know that much about tweepy. If you look at this other tweepy post : (http://stackoverflow.com/questions/23531608/how-do-i-save-streaming-tweets-in-json-via-tweepy) the user also converted his `data` object to a dict. – Nahid O. Feb 15 '17 at 11:14
@NahidO. ah! I didn't realise the difference between `on_data(self, data)` and `on_data(self, status)`. The code I have provided actually would only work for `on_data(self, status)`, my mistake. – asongtoruin Feb 15 '17 at 11:23

Filtering Twitter data using Tweepy

1 Answers1