0

While I was writing a custom RSS feed for my PHP program, I've come across an issue that the ampersand (&) character has to be converted to &. I'm wondering if there are other characters like this. Thanks for your information.

This is invalid:

<?xml version="1.0" encoding="UTF-8" ?>         
<rss version="2.0">
<channel>
    <title>custom user feed</title>                 
        <item>
            <description>
                <div>a & b</div>
            </description>
        </item>
</channel>      
</rss>

Reference: Why can't RSS handle the ampersand?

Community
  • 1
  • 1
Teno
  • 2,582
  • 4
  • 35
  • 57

1 Answers1

3

Yes, at a bare minimum, it should be obvious that < will cause you issues, since it would be taken as a tag start. It is usually encoded as &lt;.

See http://en.wikipedia.org/wiki/XML#Escaping for more detail.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • Not only "usually": `The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section.` see [the XML spec](http://www.w3.org/TR/REC-xml/#syntax) – fvu Oct 14 '12 at 11:08
  • @paxdiablo Thanks for such a quick reply and the information. I tried this `
    a & b < c > "d" 'e'
    ` and it seems that double and single quotes and `>` are okay to be used without not escaping. So `&` and `<` are all I need to care about?
    – Teno Oct 14 '12 at 11:09
  • @fvu, I agree with you. I was just saying that it was usually encoded in that way, not that it can sometimes be in there as the literal `<` character. I think there are also other encodings that aren't literal `<` but still give the same result, such as the numeric character reference. – paxdiablo Oct 14 '12 at 11:23
  • @Teno, those are the two key syntax markers that are forbidden. Most of the others are conveniences rather than dictates. – paxdiablo Oct 14 '12 at 11:26
  • @paxdiablo indeed - `If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings " & " and " < " respectively.` (XML spec). – fvu Oct 14 '12 at 11:28