0

I have a 3rd party library with a parser that expects a callback class with a new_token method. So far, my callback class and my token processing looks like this:

class MySink(object):
    def __init__(self):
        self.tokens = []

    def new_token(self, token):
        self.tokens.append(token)

sink = MySink()
p = ThirdPartyParser(sink)
p.parse("my_data_file")

for t in sink.tokens:
    print t

The token list can get very long (leading to memory problems) so I'd like to turn MySink into an iterable class where the tokens don't have to be stored in a list but are "pulled" on the fly and the parsing is stopped while the token is processed. Something like this:

class MyIterableSink(object): # incomplete!
    def new_token(self, token):
        # TODO:
        # Store token for next iteration step
        # halt execution like with 'yield', wait for next iteration step

sink = MyIterableSink()
p = ThirdPartyParser(sink)
p.parse("my_data_file")

for t in sink:
    print t

How do I have to modify the MyIterableSink class? Is something like this possible? I can't modify the parser class, only the callback class. I know I have to implement __iter__ and __next__ methods and use coroutines where probably the tokens are sent with the send method but can't quite wrap my head around it. Any code examples would be appreciated.

chiborg
  • 26,978
  • 14
  • 97
  • 115
  • @PadraicCunningham Not sure what you mean by that. For me "Container" implies that all the tokens are stored internally (like the class does now). I don't want that. Instead, I want only one token to be stored at a time. – chiborg Mar 25 '15 at 10:04
  • I edited the example code to make it clearer. The tokens are pushed into the class with the `new_token` callback. – chiborg Mar 25 '15 at 10:07
  • Are the callbacks asynchronous? – Vincent Mar 25 '15 at 10:16
  • No, the callbacks are synchronous. – chiborg Mar 25 '15 at 10:32

2 Answers2

1

The line

p.parse("my_data_file")

must be calling new_token in a loop. Since you can't change the way the third-party parser works, you are not in control of the way new_token gets called. Making MySink an iterator is not going to work since p.parse is not iterating over sink. So instead of making sink an iterator, simply process the tokens as new_token is called:

class MySink(object): 
    def new_token(self, token):
        # process token
        print(token)

sink = MyIterableSink()
p = ThirdPartyParser(sink)
p.parse("my_data_file")
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
0

If the callbacks are asynchronous, you can use a Queue:

class MySink(object):
    def __init__(self):
         self.tokens = Queue()

    def new_token(self, token):
        self.tokens.put(token)

    def __iter__(self):
        token = self.tokens.get()
        while token is not None:
            yield token
            token = self.tokens.get()

Note that you have to specify a stop condition for your iterator, like a timeout or a special token value (None in the example above).

EDIT: Since your callbacks are synchronous, unutbu said it all in his answer.

Community
  • 1
  • 1
Vincent
  • 12,919
  • 1
  • 42
  • 64