I’m trying to parse text from a file that comes in a pseudo XML format. I can get a DOM document out of it when it comes in the following structure:
<product>
<product_id>234567</product_id>
<description>abc</description>
</product>
The problem I’m running into happens when the structure is similar to the following:
<product>
<product_id>234567</product_id>
<description>abc</description>
<quantity 1:2>
<version>1.1</version>
</quantity 1:2>
<version>1.2</version>
<quantity 2:2>
</quantity 2:2>
</product>
It generates the following exception due to the space in <quantity 1:2>
:
org.xml.sax.SAXParseException:[Fatal Error] :1:167: Element type " quantity " must be followed by either attribute specifications, ">" or "/>"
I can get around this by replacing the space with an underscore. The problem is the structure can be vary in size and include several child nodes with the same format (<node 1:x>
) and the file can contain hundreds of structures to parse. Is there a class available that will parse text like this a return a tree-like object?