0

I want to extract html page from an xml file. Any ideas please ?

 <?xml ....>
      <first>
      </first>

         <second>
         </second>
      <xhtml>
          <html>
              .....some html code here
          </html>
      </xhtml>

I want to extract html page as it is from the above.

af_khan
  • 1,030
  • 4
  • 25
  • 49

3 Answers3

0

because xml and html markup is similar any xml parser might have issues with it. I would suggest when you save the html data in the xml file, you encode it to prevent the xml parser from having issues. Then when you recall the data from the xml you just need to decode it for use.

<?xml ....?
<first></first>
<second></second>
<markup>
    &lt;html&gt;
        code here
    &lt;/html&gt;
</markup>

when you decode the markup section it will look like this

<html>
    code here
</html>
Kenneth Garza
  • 1,886
  • 14
  • 12
  • i am not generating this xml... Its coming as a response from a webservice. So no control on what i get. – af_khan Apr 15 '13 at 12:06
  • ahh ok, well then no fun there. I am not awesome with java (or even decent at it) so I can be of little help on that front. Good luck though. – Kenneth Garza Apr 15 '13 at 12:12
0

You might find this of some use:

http://www.w3schools.com/xml/xml_parser.asp

You can extract the HTML from the XML using JavaScript. You can then create an element on your HTML page in JavaScript and dump the HTML in there. The only issue with this is that it seems that the XML data you're receiving has a HTML tag.

If you want to add the content to an existing page, then you would have to strip the html and body tags.

James Croft
  • 1,680
  • 13
  • 25
0

If you use python, extraction can be very easy.

from simplified_scrapy.simplified_doc import SimplifiedDoc 
html='''
 <?xml >
    <first>
    </first>
        <second>
        </second>
    <xhtml>
        <html>
            .....some html code here
        </html>
    </xhtml>
'''
doc = SimplifiedDoc(html)
html = doc.xhtml.html
print (html)

First you need to install simplified_scrapy using pip.

pip install simplified_scrapy
dabingsou
  • 2,469
  • 1
  • 5
  • 8