0

I've used Marco Bonzanini's tutorial on mining Twitter data : https://marcobonzanini.com/2015/03/02/mining-twitter-data-with-python-part-1/

class MyListener(StreamListener):

    def on_data(self, data):
        try:
            with open('python.json', 'a') as f:
                f.write(data)
                return True
        except BaseException as e:
            print("Error on_data: %s" % str(e))
        return True

    def on_error(self, status):
        print(status)
        return True

and used the "follow" parameter of the filter method to retrieve the tweets produced by this specific ID :

twitter_stream = Stream(auth, MyListener())
twitter_stream.filter(follow=["63728193"#random Twitter ID])

However, it does not seem to fulfill the mission since it not only returns the tweets & retweets created by the ID, but also every tweet wherein the ID is mentioned (i.e. retweets). That is not what I want.

I'm sure there must be a way to do it since there is a "screen_name" field in the json file given by Twitter. That screen_name field gives the name of the creator of the Tweet. I just have to find how to filter the data on this screen_neame field.

Nahid O.
  • 171
  • 1
  • 3
  • 14

1 Answers1

2

This behaviour is by design. To quote the Twitter streaming API docs:

For each user specified, the stream will contain:

  • Tweets created by the user.
  • Tweets which are retweeted by the user.
  • Replies to any Tweet created by the user.
  • Retweets of any Tweet created by the user.
  • Manual replies, created without pressing a reply button (e.g. “@twitterapi I agree”).

The best way for you to process it for your purposes is to check who created the tweet as it is received, which I believe can be done as follows:

class MyListener(StreamListener):
    def on_data(self, data):
        try:
            if data._json['user']['id'] == "63728193":
                with open('python.json', 'a') as f:
                    f.write(data)
        except BaseException as e:
            print("Error on_data: %s" % str(e))
        return True

    def on_error(self, status):
        print(status)
        return True
asongtoruin
  • 9,794
  • 3
  • 36
  • 47
  • Not sure what your comment is getting at here? – asongtoruin Feb 15 '17 at 10:23
  • Your code is almost ok. However `data` is not a JSON object but a str object. You have to convert `data` from str to dict before doing a request on it : `class MyListener(StreamListener): def on_data(self, data): try: data = json.loads(data) if data['user']["screen_name"] == "lemondefr": print(data) except BaseException as e: print(str(e)) return True def on_error(self, status): print(status) return True` – Nahid O. Feb 15 '17 at 10:24
  • @NahidO. a `Status` object in Tweepy has a `._json` property, which should let you access the various properties of the tweet as described in my answer. I'd be interested to know if using `data._json['user']['screen_name']` didn't work for you – asongtoruin Feb 15 '17 at 10:31
  • I don't know why it didn't work. Maybe a `data`object doesn't work like a `Status` object ? I don't know that much about tweepy. If you look at this other tweepy post : (http://stackoverflow.com/questions/23531608/how-do-i-save-streaming-tweets-in-json-via-tweepy) the user also converted his `data` object to a dict. – Nahid O. Feb 15 '17 at 11:14
  • @NahidO. ah! I didn't realise the difference between `on_data(self, data)` and `on_data(self, status)`. The code I have provided actually would only work for `on_data(self, status)`, my mistake. – asongtoruin Feb 15 '17 at 11:23