0

So I am writing a simple python stream listener using twython (EDIT: python Twitter client library), when runnining the .py the output file-size oscillates between 1 and 5kb. I would like to know what to do to make sure the file keeps getting written to. Below is the code.

class MyStreamer(TwythonStreamer):
def on_success(self, data):
    with open(filename,'w')as outfile:
        json.dump(data,outfile,indent=4)
        outfile.flush()
        outfile.close()

    def on_error(self, status_code, data):
    print(status_code)

stream = MyStreamer(APP_KEY, APP_SECRET,
                OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
stream.statuses.filter(track=input_string)
Dan Lenski
  • 76,929
  • 13
  • 76
  • 124
bmf
  • 55
  • 1
  • 7
  • Why should the data written every time have the same size to begin with? How do you know that you are **reading** the same data every time? Have you tried adding `print(data)` or another debugging statement in the `on_success` function to check this assumption? – Dan Lenski Jun 18 '14 at 23:58
  • Are you asking how to append to a file instead of overwriting it? Use mode `a` instead of `w` when opening the file. But a sequence of JSON strings in a file is not a valid JSON file, so that's probably not a good idea. – Barmar Jun 19 '14 at 00:13
  • @Dan , while streaming, the output file size changes (ie 1, 2, 3, 1, 2, 1 kb etc) over a matter of seconds. Not strictly increasing in size. – bmf Jun 19 '14 at 01:46
  • @Diabellical, **why is this behavior unexpected**? If you're streaming a bunch of data from Twitter, the amount of data you get will vary unpredictably. The output file is changing in size because you are **completely overwriting it** every time. – Dan Lenski Jun 19 '14 at 18:03

1 Answers1

0

Your problem is not very clearly explained, but based on the comments above I think you are confused about the fact that the output file is constantly getting overwritten... rather than growing as new data is appended to it.

The problem is that your open(filename,'w') over-writes the file every time it goes through. Try doing this instead:

# global outfile 
outfile = open(filename,'w')

class MyStreamer(TwythonStreamer):
    def on_success(self, data):
        json.dump(data,outfile,indent=4)
        outfile.flush()

        def on_error(self, status_code, data):
            print(status_code)

stream = MyStreamer(APP_KEY, APP_SECRET,
                OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
stream.statuses.filter(track=input_string)

# when you are actually done writing output to it:
# outfile.close()

Note that this approach will not produce a valid JSON file because you are just concatenating together multiple chunks of JSON. But that's a separate issue. JSON isn't intended to be a "streaming" format in the first place, but see this thread for some discussion.

Community
  • 1
  • 1
Dan Lenski
  • 76,929
  • 13
  • 76
  • 124