90

I need to insert HTML content into an XML document, is this possible or should HTML content be, for example, encoded in BASE64 or with something else like that?

pros-cons
  • 17
  • 7
newbie
  • 24,286
  • 80
  • 201
  • 301

5 Answers5

164

You can include HTML content. One possibility is encoding it in BASE64 as you have mentioned.

Another might be using CDATA tags.

Example using CDATA:

<xml>
    <title>Your HTML title</title>
    <htmlData><![CDATA[<html>
        <head>
            <script/>
        </head>
        <body>
        Your HTML's body
        </body>
        </html>
     ]]>
    </htmlData>
</xml>

Please note:

CDATA's opening character sequence: <![CDATA[

CDATA's closing character sequence: ]]>

Pablo Santa Cruz
  • 176,835
  • 32
  • 241
  • 292
  • 7
    +1 CDATA is way better IMO because it keeps things human-readable, and doesn't come with base64's unavoidable 33% additional weight – Pekka Dec 10 '10 at 19:15
  • 3
    just remember that XML and CDATA preserve white-space. – zzzzBov Dec 10 '10 at 19:18
  • 5
    I decided to use BASE64, because it will save data with 100% accuracy, when CDATA and escaping will add extra whitespace if formatted. Also adding CDATA would add extra complexity and would require some kind of prescanning HTML, in case if HTML contained CDATA element... – newbie Oct 25 '11 at 06:49
  • 1
    The link has expired. – Franklin Yu Sep 01 '17 at 14:25
27

so long as your html content doesn't need to contain a CDATA element, you can contain the HTML in a CDATA element, otherwise you'll have to escape the XML entities.

<element><![CDATA[<p>your html here</p>]]></element>

VS

<element>&lt;p&gt;your html here&lt;/p&gt;</element>
zzzzBov
  • 174,988
  • 54
  • 320
  • 367
8

The purpose of BASE64 encoding is to take binary data and be able to persist that to a string. That benefit comes at a cost, an increase in the size of the result (I think it's a 4 to 3 ratio). There are two solutions. If you know the data will be well formed XML, include it directly. The other, an better option, is to include the HTML in a CDATA section within an element within the XML.

Rich
  • 2,076
  • 1
  • 15
  • 16
2

Please see this.

Text inside a CDATA section will be ignored by the parser.

http://www.w3schools.com/xml/dom_cdatasection.asp

This is will help you to understand the basics about XML

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
loyola
  • 3,905
  • 2
  • 24
  • 18
0

Just put the html tags with there content and add the xmlns attribute with quotes after the equals and in between the quotes is http://www.w3.org/1999/xhtml