4

I have a python generator that yields parts of a file (a wsgi app_iter) and I need to pass it to an interface that expects it to have the classical read and readlines methods (I want to pass it as wsgi.input of another Request).

Is is possible to do it in a way that does not materialize the whole generator content into memory? The idea is to wrap the generator in something that has read and readline (such as BytesIO or StringIO) and do it in a lazy fashion.

martineau
  • 119,623
  • 25
  • 170
  • 301
enrico.bacis
  • 30,497
  • 10
  • 86
  • 115

1 Answers1

3

It's certainly possible. Here's a woefully-inefficient piece of code to give you the idea:

class ReadWrapper:
    def __init__(self, app_iter):
        self.iterator = iter(app_iter)
        self.buffer = ''
    def readline(self):
        while '\n' not in self.buffer:
            try:
                self.buffer += next(self.iterator)
            except StopIteration:
                result = self.buffer
                self.buffer = ''
                return result
        idx = self.buffer.find('\n')
        result = self.buffer[:idx+1]
        self.buffer = self.buffer[idx+1:]
        return result

read() would be similar except that instead of looking for \n, you're looking for the specified number of bytes (or the end of the iterator if no size is specified).

The woeful inefficiency of the above code is in the way it handles self.buffer: you don't really want to be searching the whole thing for \n at every step, or doing so many potentially large copies.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • 1
    I thought that there was something already in the standard library or at least in PYPI, it seems to be something good to put in a library – enrico.bacis Apr 23 '16 at 09:40