While I have seen How to parse multiple XML documents from a single stream? and Parse XML stream from TCP socket
which essentially ask the same question but for different languages the answers do not really apply.
I have an application that I cannot modify. It sends multiple XML documents on a TCP stream. Somehow I need to parse the data and process it.
One of the documents sent looks like this (yes, there is no XML processing instruction):
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>message</key>
<string>system.mainStation</string>
<key>packet type</key>
<string>Perform Command</string>
</dict>
</plist>
Both Java's DOM and SAX parsers complain about illegal characters in the document prologue. I tried doing something with ANTLR4 but the generated lexer does not produce tokens as they are discovered on the stream but only in the end, when the connection is closed.
How would I configure an XML parser to actually accept such documents or to emit content it has gathered so far?
EDIT: I had overlooked some very basic thing that is even best practice when sending data over TCP (see https://mina.apache.org/mina-project/userguide/ch9-codec-filter/ch9-codec-filter.html, section How?): The data indeed did contain a length header. So the parser was right all the time...
So finally I fixed it by processing that data first. Then the parser complained about not being able to download the DTD, which I fixed following https://stackoverflow.com/a/155353/4222206