11

While running this program to retrieve Twitter data using Python 2.7.8 :

#imports
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener

#setting up the keys
consumer_key = '…………...'
consumer_secret = '………...'
access_token = '…………...'
access_secret = '……………..'

class TweetListener(StreamListener):
# A listener handles tweets are the received from the stream.
#This is a basic listener that just prints received tweets to standard output

def on_data(self, data):
    print (data)
    return True

def on_error(self, status):
    print (status)

#printing all the tweets to the standard output
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)



stream = Stream(auth, TweetListener())

t = u"سوريا"
stream.filter(track=[t])

after running this program for 5 hours i got this Error message:

Traceback (most recent call last):
  File "/Users/Mona/Desktop/twitter.py", line 32, in <module>
    stream.filter(track=[t])
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tweepy/streaming.py", line 316, in filter
    self._start(async)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tweepy/streaming.py", line 237, in _start
    self._run()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tweepy/streaming.py", line 173, in _run
    self._read_loop(resp)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tweepy/streaming.py", line 225, in _read_loop
    next_status_obj = resp.read( int(delimited_string) )
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 543, in read
    return self._read_chunked(amt)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 612, in _read_chunked
    value.append(self._safe_read(chunk_left))
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 660, in _safe_read
    raise IncompleteRead(''.join(s), amt)
IncompleteRead: IncompleteRead(0 bytes read, 976 more expected)
>>> 

Actually i don't know what to do with this problem !!!

James Scholes
  • 7,686
  • 3
  • 19
  • 20
Hana
  • 157
  • 1
  • 2
  • 10
  • 1
    https://github.com/tweepy/tweepy/pull/498 This was fixed recently. Make sure you're using the latest Tweepy – Luigi Oct 29 '14 at 19:20
  • actually when i install which is "pip install tweepy" the new version of tweepy in the MAC OSX Terminal, i got this message " Requirement already satisfied (use --upgrade to upgrade): tweepy in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages Cleaning up…" , please how i can overwrite the previous version? – Hana Oct 29 '14 at 19:57
  • `pip install tweepy --upgrade` . The update was only pushed to github 8 days ago though so pip may not have the latest version. You can always edit the source yourself/check it to be sure, I think the change is ~ one line. – Luigi Oct 29 '14 at 20:08
  • aha it's the same as tweepy2.3 he just added that line "except (Timeout, ssl.SSLError, requests.compat.IncompleteRead) as exc:" in tweepy/streaming.py and actually i already have that line at tweepy2.3 :( – Hana Oct 29 '14 at 20:11
  • Could you add an exception handler in your `TweetListener`? – Luigi Oct 30 '14 at 00:16

5 Answers5

8

You should check to see if you're failing to process tweets quickly enough using the stall_warnings parameter.

stream.filter(track=[t], stall_warnings=True)

These messages are handled by Tweepy (check out implementation here) and will inform you if you're falling behind. Falling behind means that you're unable to process tweets as quickly as the Twitter API is sending them to you. From the Twitter docs:

Setting this parameter to the string true will cause periodic messages to be delivered if the client is in danger of being disconnected. These messages are only sent when the client is falling behind, and will occur at a maximum rate of about once every 5 minutes.

In theory, you should receive a disconnect message from the API in this situation. However, that is not always the case:

The streaming API will attempt to deliver a message indicating why a stream was closed. Note that if the disconnect was due to network issues or a client reading too slowly, it is possible that this message will not be received.

The IncompleteRead could also be due to a temporary network issue and may never happen again. If it happens reproducibly after about 5 hours though, falling behind is a pretty good bet.

Luigi
  • 4,129
  • 6
  • 37
  • 57
  • 1
    I may miss it, but stall_warning only shows warning to confirm error type. I believe you didn't provide the solution. I have this problem right now and syspecting that you might be right, so if you know solution. I would be appreaciated if you share it with us. – A-nak Wannapaschaiyong Jan 11 '21 at 12:41
5

I've just had this problem. The other answer is factually correct, in that it's almost certainly:

  • Your program isn't keeping up with the stream
  • you get a stall warning if that's the case.

In my case, I was reading the tweets into postgres for later analysis, across a fairly dense geographic area, as well as keywords (London, in fact, and about 100 keywords). It's quite possible that, even though you're just printing it, your local machine is doing a bunch of other things, and system processes get priority, so the tweets will back up until Twitter disconnects you. (This is typically manifests as an apparent memory leak - the program increases in size until it gets killed, or twitter disconnects - whichever is first.)

The thing that made sense here was to push off the processing to a queue. So, I used a redis and django-rq solution - it took about 3 hours to implement on dev and then my production server, including researching, installing, rejigging existing code, being stupid about my installation, testing, and misspelling things as I went.

Now, in your django directory (where appropriate - ymmv for straight python applications) run: python manage.py rqworker &

You now have a queue! You can add jobs to that like by changing your handler like this: (At top of file)

import django_rq

Then in your handler section:

def on_data(self, data):
    django_rq.enqueue(print, data)
    return True

As an aside - if you're interested in stuff emanating from Syria, rather than just mentioning Syria, then you could add to the filter like this:

stream.filter(track=[t], locations=[35.6626, 32.7930, 42.4302, 37.2182]

That's a very rough geobox centred on Syria, but which will pick up bits of Iraq/Turkey around the edges. Since this is an optional extra, it's worth pointing this out:

Bounding boxes do not act as filters for other filter parameters. For example track=twitter&locations=-122.75,36.8,-121.75,37.8 would match any tweets containing the term Twitter (even non-geo tweets) OR coming from the San Francisco area.

From this answer, which helped me, and the twitter docs.

Edit: I see from your subsequent posts that you're still going down the road of using Twitter API, so hopefully you got this sorted anyway, but hopefully this will be useful for someone else! :)

Community
  • 1
  • 1
Withnail
  • 3,128
  • 2
  • 30
  • 47
  • As I've just got an upvote for this that brought me back, it'd be really great if you could pick one of the answers if they address(ed) your problem @hana - either mine or luigi. :) – Withnail Feb 10 '16 at 15:30
0

This worked for me.

l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
while True:
    try:
        stream.filter(track=['python', 'java'], stall_warnings=True)
    except (ProtocolError, AttributeError):
        continue
Debjit Bhowmick
  • 920
  • 7
  • 20
0

A solution is restarting the stream immediately after catching exception.

# imports
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener

# setting up the keys
consumer_key = "XXXXX"
consumer_secret = "XXXXX"
access_token = "XXXXXX"
access_secret = "XXXXX"

# printing all the tweets to the standard output
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)


class TweetListener(StreamListener):
    # A listener handles tweets are the received from the stream.
    # This is a basic listener that just prints received tweets to standard output
    def on_data(self, data):
        print(data)
        return True

    def on_exception(self, exception):
        print('exception', exception)
        start_stream()

    def on_error(self, status):
        print(status)


def start_stream():
    stream = Stream(auth, TweetListener())
    t = u"سوريا"
    stream.filter(track=[t])


start_stream()
Guilherme do Valle
  • 401
  • 1
  • 4
  • 17
Ario
  • 549
  • 1
  • 8
  • 18
-1

For me the back end application to which the URL is pointing is directly returning the string

I changed it to

return Response(response=original_message, status=200, content_type='application/text')

in the start I just returned text like

return original_message

I think this answer works only for my case

stuckoverflow
  • 625
  • 2
  • 7
  • 23
Hemanth Vatti
  • 85
  • 1
  • 8