4

I am working on a server application which receives data over a TCP socket in an XMPP-like XML format, i.e. every child of the <root> element essentially represents one separate request (stanza). The connection is closed as soon as </root> is received. I do know that I must use a stream parser like SAX, somehow. Though, for convenience, I'd prefer to have a tree-like interface to access each stanza's child elements. (The data sent with every request is not large so I think it makes sense to read each stanza as a whole.)

What's the best way to realize that in Python (preferably v3)?

This is the code I'd like to build it in. Feel free to point me in a totally different direction to solve this issue.

import socketserver
import settings

class MyServer(socketserver.ThreadingMixIn, socketserver.TCPServer):
    pass

class MyRequestHandler(socketserver.StreamRequestHandler):
    def handle(self):
        pass

if __name__ == '__main__':
    server = MyServer((settings.host, settings.port), MyRequestHandler)
    server.serve_forever()
balu
  • 3,500
  • 4
  • 34
  • 35

2 Answers2

3

You'll want to use a push based parser that emits SAX events. Basically you want a parser that you can call pushChunk(data) with a partial bit of data, and have it an event handler for the first-level child end tag event that generates your stanzas. That can then be sent to application processing logic.

If you want to see an example of this, here is the expat parser for libstrophe, an XMPP client library I wrote: http://github.com/metajack/libstrophe/blob/master/src/parser_expat.c

Building a whole document for each stanza is quite expensive. It is possible to implement this with a single parser instance, as opposed to continually making new document parsers for each stanza.

If you need a working Python version, you can probably use or pull out the code from Twisted Words (twisted.words.xish I believe).

metajack
  • 431
  • 2
  • 4
  • Another trick is to use a single element pointer as the stack for your current position. When you get a new element event, you create an element in your dom. If the stack is not null, you add this element as a child to the stack element, and set the stack pointer to the new element. When you get an end element event, you set the stack pointer to the parent of the current stack pointer. If the stack pointer is nil at the end of this operation, you have a stanza. Note: this is what Jack's code linked to above more-or-less does. – Joe Hildebrand May 05 '10 at 05:27
  • Just in case anyone needs a Python solution for this one: http://stackoverflow.com/questions/1459648/non-blocking-method-for-parsing-streaming-xml-in-python (the post marked as accepted answer). – balu May 19 '10 at 23:20
  • @JoeHildebrand How would you ever get a nil stack pointer? A xmpp xml stream starts with ``, and doesn't get closed until its time to disconnect; you'd **never** end up with your stanza, as you'd never reach the end of your element. Perhaps expand on what you mean with a stack; is it a stack of XmlDomElement objects? (The worst part about XMPP is having to reinvent a complete XML parser from scratch...) – Ian Boyd Nov 03 '19 at 12:41
  • Yes, a stack of DOM elements. This needs to be relatively XMPP-specific, in that you keep track of the state of the document on startup, and ensure that none of the stanzas are actually children of the stream:stream root. – Joe Hildebrand Nov 04 '19 at 18:35
1

What we did for Skates is that we use a Sax parser to build the stream, but use this parser to build a whole document for each stanza received.

Julien Genestoux
  • 31,046
  • 20
  • 66
  • 93