2

Good morning everyone,
I'm having a rave with my twitter bot - I need to dump the streamed tweets (which arrive in json) to a file. I previously have done this by writing it as utf8 formatted strings, however it now turns out that I still need to filter some data, so storing it away as json in the file seemed like the easiest way to go. I edited the code accordingly:

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import datetime
import json

access_token = #####
access_token_secret = ##### 
consumer_key = #####
consumer_secret =   ##### 

class StdOutListener(StreamListener):

    def on_status(self, status):
        print(status)
        today = datetime.datetime.now()
        with open('/git/twttrbots/data/Twitter_Raw %s' %
                        today.strftime("%a-%Y-%m-%d"), 'a') as f:
            json.dump(status, f)  # <- doesn't work
            #f.write(json.dumps(status))  # <- doesn't work
            #f.write("Blah")    # <- works perfectly fine

if __name__ == '__main__':
    while True:
        try:
            #login using auth
            l = StdOutListener()
            auth = OAuthHandler(consumer_key, consumer_secret)
            auth.set_access_token(access_token, access_token_secret)
            stream = Stream(auth, l)

            #filter by hashtag
            stream.filter(track=['bitcoin', 'cryptocurrency', 'wonderlandcoin',
                                    'btc', 'fintech', 'satoshi', 'blockchain',
                                        'litecoin', 'btce'])
        except:
            print("Whoops, dicsonnected at %s. Retrying"
                    % datetime.datetime.now())
            continue

The file is created, the status definitely is read (there's print output in my terminal) but somewhere along the way my data is blasted out into nirvana, instead of my file - as that remains empty at 0 bytes.

I found similar cases here and on other platforms, however, they used json.dumps() instead of json.dump() - albeit I have tried both functions as well (using f.write(dumps(status))), but none of them seem to work.

Now, I'm not a complete fool; I am well aware that it's probably on my end - not a JSON error - but I can't figure out what it is i am doing wrong.

The only thing I was able to do, is boil it down to an error that occurs in my with open() statement, leading me to believe it's something about either the open() mode, or the way I write my data to the file. I know this, since the above linked question's answer works fine on my machine.

I could, of course, use the subprocess module and call a pipe that dumps the print(status) to a file, but that can't be the solution to this?

Addendum
As requested, here's my console output.

Here's what the logger caught when I called logger.debug('status dump: %s', json.dumps(status)).

Community
  • 1
  • 1
deepbrook
  • 2,523
  • 4
  • 28
  • 49
  • Can you log what the unencoded text looks like? It's possible that something in it invalidates the JSON format. – SuperBiasedMan Oct 06 '15 at 09:39
  • I added my print output as a pastebin link - i havent put any logging in my code before, but i will as soon as I have it implemented. – deepbrook Oct 06 '15 at 10:18
  • Added what my logfile says. this si the first time I use, and I'm not sure this is the right way to log it. If you have any pointers, I'm listening :/ – deepbrook Oct 06 '15 at 10:34
  • There's a lot of information here to parse, but immediately I'd note that `json.dump` would throw errors for single quotes instead of doubles and for `False` not being in quotation marks. [This site](https://jsonformatter.curiousconcept.com/) is helpful for testing JSON data for validity. – SuperBiasedMan Oct 06 '15 at 10:38
  • The data is valid coming out of the stream is definitely JSON - I dumped it to a file before using using the CLI; I then extracted the data using the JSON module (read by line and json.loads(line)), it may just be the print() messing it up? Hence my suspicion it has something to do with the write mode. **edit**: Scratch the print() rambling; since dumping in CLI via '>' takes print()'s output, that can't be. – deepbrook Oct 06 '15 at 10:43
  • I repeal my previous statement - it's not working if I dump it via CLI anymore. But that still doesn't get me further down the road to a solution - So I know that for whatever reason the JSON format doesn't cut it - even though I directly pass it from the API to the module's dump function (I'm not touching it anywhere else). And since I am definitely not the only one using the tweepy library, or even streaming tweets from twitter, where is the error in my code? :/ – deepbrook Oct 06 '15 at 11:01

1 Answers1

1

An initial observation I have to make (because it got frustrating when I tried to run this script):

Don't make your except clauses too broad, you catch everything (Including KeyboardInterrupt) and make it hard to stop execution.

This is optionary, but it's good to add the appropriate interrupt:

except KeyboardInterrupt:
    exit()

The second thing you are doing which is just making your life a bit harder is that not only are you catching everything using the bare except; you are not printing the corresponding error. Adding this will catch the culprit in this case, point you to the right direction and make your life so much easier.

except Exception as e:
    print("Error: ", e)
    print("Whoops, dicsonnected at %s. Retrying"
            % datetime.datetime.now())

This outputs a (rather disgusting) message which essentially prints out the Status object and ends with a line informing you that this object:

is not JSON serializable

This is somewhat logical since what we are dealing here isn't a json object but, instead, a Status object returned from tweepy.Stream.

I have no idea why exactly the creator(s) of tweepy have done this, a believe there's solid reasons behind it, but to solve your issue you can simply access the underlying .json object:

json.dump(status._json, f) 

Now, you should be good to go.


Can't convert 'bytes' object to str implicitly

This seems to be an internal tweepy issue relating to the transition from Python 2 to Python 3.x. Specifically, in file streaming.py:

File "/home/jim/anaconda/envs/Python3/lib/python3.5/site-packages/tweepy/streaming.py", line 171, in read_line
    self._buffer += self._stream.read(self._chunk_size) <--
TypeError: Can't convert 'bytes' object to str implicitly

First probable solution:

There has been a solution proposed (and according to the replies, working) on the tweepy GitHub repository by user cozos suggesting:

In streaming.py:

I changed line 161 to:

self._buffer += self._stream.read(read_len).decode('ascii')

and line 171 to:

self._buffer += self._stream.read(self._chunk_size).decode('ascii')

and then reinstalled.

Even though I'm not sure what he means by 'reinstalled'.

Second solution:

Use tweepy with Python 2.7.10. It works like a charm.

Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253
  • Thanks for the pointers! I'm no dumping via `json.dump(status._json, f)`, but the error now is: `Error: 'str' does not support the buffer interface`. :/ – deepbrook Oct 06 '15 at 11:25
  • I did find a useful pointer [here](http://stackoverflow.com/questions/18337407/saving-utf-8-texts-in-json-dumps-as-utf8-not-as-u-escape-sequence). it's still not 100% the data I want dumped, but it DOES dump data in general, so I consider my question answered. Thank you! – deepbrook Oct 06 '15 at 11:35
  • I'm looking into the error, If I find something useful I'll update my answer accordingly :-) – Dimitris Fasarakis Hilliard Oct 06 '15 at 11:41
  • I edited my answer to include two alternate probable solutions. Check them out and if you have the energy, give em a shot. – Dimitris Fasarakis Hilliard Oct 06 '15 at 12:16
  • I know about solution one, and actually have done that already; saved me tons of annoying encoding errors. But alas, that can't be it then. And regarding solution 2: Python 2 is the devil. Just kidding. I'll give it a shot, although we're trying not to code in 2.7 unless there is absolutely no other way. But I supposed im pretty close to that point. Thanks for the update! – deepbrook Oct 06 '15 at 12:20