3

I have a socket connection for an external system which accept commands and sends results in XML. Every command and result is a standalone XML document.

Which Java parser (/combination) should i use to:

  • parse the stream continuously without closing the connection (i know it's stupid, but i tried DOMParser in the past and it throws an exception when an another document root encountered on the stream which is perfectly understandable). I need something like: continously read the stream and when a document is fully received, do it's processing. I don't know how big the document is, so i need to leave to the parser to figure out the end of the document.
  • deserialize every incoming document into bean instances (similary like XStream does)
  • serialize command object to the output stream from annotated class instances (similarly like XStream does). I don't want to use two separate libraries for sending and receiving.
NagyI
  • 5,907
  • 8
  • 55
  • 83
  • 1
    Check out [this](http://stackoverflow.com/questions/3302575) posting, that might answer your question. – Andreas Baus Feb 08 '12 at 11:55
  • Sadly i can't use that, because i don't get any processing header which could be used to separate documents. Encoding is always UTF-8 so the header is simply omitted. – NagyI Feb 08 '12 at 12:13

3 Answers3

2

Well... XStream.createObjectInputStream seems to be what you need. I'm not sure if the stream provided must enclose all objects into a root node, but anyway you could arrange an inputstreams that add some virtual content to accomodate to XStream needs. I'll expand this answer later...

http://x-stream.github.io/objectstream.html has some samples...

Root node

Indeed the reader needs a root node. So you need an inputstream that reads <object-stream> plus the real byte content, plus a </object-stream> at the end (if you mind about that end). Depending on what you need (inputstream, readers) the implementation can be slighly different but it can be done.

Sample

You can use SequenceInputStream to concatenate virtual content to the original inputstream:

InputStream realOne = ..
// beware of the encoding!
InputStream root = new ByteArrayInputStream("<object-stream>".toBytes("UTF-8")); 
InputStream all = new SequenceInputStream(root, realOne);

xstream.createObjectInputStream(withRoot); // voi lá

If you use readers... well. There must be something equivalent :)

facundofarias
  • 2,973
  • 28
  • 27
helios
  • 13,574
  • 2
  • 45
  • 55
  • Worth a try. But i don't really like to use a hacked solution for this. Hopefully i don't need the forgery you mentioned at the end. – NagyI Feb 08 '12 at 12:08
  • Well, if the sender builds the XML with a root there will be no problem. The other way could be insert markers to indicate different parts so you can find easily the end. – helios Feb 08 '12 at 12:17
  • Changed PushbackInputStream for SequenceInputStream. It's more aligned with the idea of reading two things, one first then the other. – helios Feb 10 '12 at 11:55
  • I've done the implementation based on your solution the past weekend and it worked as expected. Thank you! – NagyI Feb 14 '12 at 08:56
  • 1
    I've also made the reverse of this: streaming to the socket output with the XStream header bypassed. I wrote a simple `FilterOutputStream` implementation which discards the header. And it's working too :) – NagyI Feb 14 '12 at 09:35
0

Your best bet is probably SAX parser. With it, you can implement ContentHandler document and in there, in endDocument method, do the processing and prepare for the next document. Have a look at this page: http://docs.oracle.com/javase/tutorial/jaxp/sax/parsing.html - for explanation and examples.

Aleks G
  • 56,435
  • 29
  • 168
  • 265
  • SAX is an event based parser. Is there any solution which can be used to deserialize objects through SAX? Eg. XStream can't use SAX as a reader. – NagyI Feb 08 '12 at 12:10
  • You should be able to initialise SAX parser with an `InputSource` based on your `InputStream`. Have a look at this page: http://www.ibm.com/developerworks/xml/library/x-tipsaxis/index.html – Aleks G Feb 08 '12 at 12:24
  • If you want to deserialize your object, you are keeping the object in memory after serialization no matter which is the length of the input XML. If it is the case, are you sure you cannot do as gasan suggest (get the whole response and parse it)? You are keeping the entire message in memory anyway (though probably the object representation may be smaller). – fpacifici Feb 10 '12 at 15:40
0

I'd say you read one full complete response, then parse it. Then read the other. I see no need to continuously read responses.

dhblah
  • 9,751
  • 12
  • 56
  • 92
  • Sadly i don't know how long a document will be (added this to the question, thanks for mentioning it). I must left it to the parser to figure out when an XML document ends. – NagyI Feb 08 '12 at 12:06