0

While I have seen How to parse multiple XML documents from a single stream? and Parse XML stream from TCP socket

which essentially ask the same question but for different languages the answers do not really apply.

I have an application that I cannot modify. It sends multiple XML documents on a TCP stream. Somehow I need to parse the data and process it.

One of the documents sent looks like this (yes, there is no XML processing instruction):

<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>message</key>
        <string>system.mainStation</string>
        <key>packet type</key>
        <string>Perform Command</string>
    </dict>
</plist>

Both Java's DOM and SAX parsers complain about illegal characters in the document prologue. I tried doing something with ANTLR4 but the generated lexer does not produce tokens as they are discovered on the stream but only in the end, when the connection is closed.

How would I configure an XML parser to actually accept such documents or to emit content it has gathered so far?

EDIT: I had overlooked some very basic thing that is even best practice when sending data over TCP (see https://mina.apache.org/mina-project/userguide/ch9-codec-filter/ch9-codec-filter.html, section How?): The data indeed did contain a length header. So the parser was right all the time...

So finally I fixed it by processing that data first. Then the parser complained about not being able to download the DTD, which I fixed following https://stackoverflow.com/a/155353/4222206

Queeg
  • 7,748
  • 1
  • 16
  • 42
  • 1
    Capture as plain text and then in your code split up in multiple files you can then process one by one. – Thorbjørn Ravn Andersen Jun 25 '21 at 22:53
  • Actually I was able to resolve the problem, but not in the parser. It was the component that received these many documents over the network that omitted some bytes. Once fixed, the documents arrived correctly at the parser and the problem was gone. – Queeg Oct 07 '21 at 22:06
  • Write a full answer useful to the next person that comes by with your problem. – Thorbjørn Ravn Andersen Oct 08 '21 at 06:19

0 Answers0