0

I have a huge xml file, and some fields texts are in reality large chunks of xml, written in block of converted codes like the following:

<RESPONSE_DATA>
    <RESULT>
        <BLOCK>
 ....
        </BLOCK>
    </RESULT>
</RESPONSE_DATA>

I was wondering if there is some ready available tools or libraries to convert that into proper xml. Anything would do really, java/c# or other code, or any standalone tool.

thank you

Charbel
  • 14,187
  • 12
  • 44
  • 66
  • All you have to do is replace each < with < and > with >. Then load it into an XMLDocument. Did I miss something? – DJ Quimby Oct 20 '11 at 15:13
  • is that all what is required to change in xml? are < and > the only restricted characters? – Charbel Oct 20 '11 at 15:19
  • Given the example you have above, yeah, its that easy. – DJ Quimby Oct 20 '11 at 15:20
  • There might be other encoded characters in there. Like `"`. Or a numeric character entity. – svick Oct 20 '11 at 15:21
  • @DJQuimby (I included in the example ... to keep it generic) I'm interested in a library or tool that takes care of all situations. Thanks svick I suspected other possibilities could exist. – Charbel Oct 20 '11 at 15:28
  • I just thought about this - what if we have xml, which has another xml, which has another xml, which has another xml inside it which includes the original xml file again... –  Oct 20 '11 at 15:29
  • "Huge" for some people means 10Mb, for other people it means 10Gb. There's no point telling us it's huge without giving us a number. – Michael Kay Oct 20 '11 at 16:21
  • @VladLazarenko, it's XML all the way down! – svick Oct 20 '11 at 18:41

2 Answers2

2

When the XML parser sees input like this:

<p>
&lt;RESPONSE_DATA&gt;
    &lt;RESULT&gt;
        &lt;BLOCK&gt;
 ....
        &lt;/BLOCK&gt;
    &lt;/RESULT&gt;
&lt;/RESPONSE_DATA&gt;
</p>

it will present the application with a text node whose content is

<RESPONSE>
  <RESULT>
     <BLOCK>
 ....
     </BLOCK>
  </RESULT>
</RESPONSE>

The "readily available tool or library" that can handle this is an XML parser. When you have escaped XML nested inside XML, you need two parsing passes: the first parse extracts the nested XML as a string-with-angle-brackets, which you then pass to another XML parser to analyse its structure.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
1

If you parse the whole XML document and access this value, any XML parser should return it properly decoded. I'm pretty sure the ones in .Net do.

svick
  • 236,525
  • 50
  • 385
  • 514