I am responding to an AJAX call by sending it an XML document through PHP echos. In order to form this XML document, I loop through the records of a database. The problem is that the database includes records that have '<' symbols in them. So naturally, the browser throws an error at that particular spot. How can this be fixed?
-
Did you try creating a function that will replace all sensible character by their xml equivalents. Or maybe include all value with potential character within "" ? – David Brunelle Aug 06 '10 at 17:17
7 Answers
Since PHP 5.4 you can use:
htmlspecialchars($string, ENT_XML1);
You should specify the encoding, such as:
htmlspecialchars($string, ENT_XML1, 'UTF-8');
Update
Note that the above will only convert:
&
to&
<
to<
>
to>
If you want to escape text for use in an attribute enclosed in double quotes:
htmlspecialchars($string, ENT_XML1 | ENT_COMPAT, 'UTF-8');
will convert "
to "
in addition to &
, <
and >
.
And if your attributes are enclosed in single quotes:
htmlspecialchars($string, ENT_XML1 | ENT_QUOTES, 'UTF-8');
will convert '
to '
in addition to &
, <
, >
and "
.
(Of course you can use this even outside of attributes).

- 2,236
- 2
- 20
- 28
-
4htmlspecialchars($string, ENT_XML1, 'UTF-8') worked good for me, actually i do this all of them just for safety – Miguel Sep 16 '15 at 18:33
-
1In cases where you are formatting a string for SimpleXML that needs to be XML validated this seems to be the cleanest working solution. I am dealing with lots of special characters being used and this solved my issues. – Ryan Rentfro Oct 22 '15 at 19:05
-
`htmlspecialchars` does not escape `\xB` (vertical tab) for instance, which is [invalid XML](https://stackoverflow.com/q/14192135/2683737). – Rainer Rillke May 22 '20 at 11:01
By either escaping those characters with htmlspecialchars
, or, perhaps more appropriately, using a library for building XML documents, such as DOMDocument or XMLWriter.
Another alternative would be to use CDATA sections, but then you'd have to look out for occurrences of ]]>
.
Take also into consideration that that you must respect the encoding you define for the XML document (by default UTF-8).

- 96,375
- 17
- 202
- 225
-
6htmlspecialchars isn't the best way of doing it, because as the name suggests it's meant for HTML output, not XML. It will, for example, convert < to <, when for XML the correct encoding is < DOMDocument, simpleXML or similar XML-aware extensions would be a better bet. – GordonM Jan 07 '11 at 12:48
-
4@Gordon Hum? Since when is `<` not correct for XML? `htmlspecialchars` actually only does entity substitution with entities that are guaranteed to be available for *any* XML document, and even leaves one behind (replaces `'` with `'` when it could use `'`; of course, `'` is correct too). – Artefacto Jan 08 '11 at 00:13
-
5@Gordon By the way, there are *some* reasons why `htmlspecialchars` may be insufficient for XML (namely, it doesn't replace forbidden characters in XML and it doesn't encode forbidden entities when $double_encode is TRUE) -- which, btw, I have addressed by introducing profiles in trunk's version of htmlspecialchars/entities --, but what you say is simply not true. What you're describing is a double encoding, you need `<` in XML in the same circumstances you'd need it in HTML -- when you need to represent `<`. – Artefacto Jan 08 '11 at 00:15
-
2Not sure if < is the best example, but it is a very real problem with htmlspecialchars. It's fundamentally intended for HTML escaping, not XML. PHP provides better tools for the job than htmlspecialchars, and those should be used instead. – GordonM Jan 08 '11 at 16:19
-
I have an issue trying to insert strings with pound signs in the data (£), and htmlentities does not work, I do not think this is the correct answer, unless for some reason I'm doing something wrong. Using htmlentities, the string it returns is not accepted by DOMDocument::loadXML function. any other suggestions? – Ninjanoel Jul 04 '13 at 16:21
-
-
1`or using a library for building XML documents, such as DOMDocument` it doesnt help – Vasilii Suricov Feb 26 '20 at 15:21
1) You can wrap your text as CDATA like this:
<mytag>
<![CDATA[Your text goes here. Btw: 5<6 and 6>5]]>
</mytag>
see http://www.w3schools.com/xml/xml_cdata.asp
2) As already someone said: Escape those chars. E.g. like so:
5<6 and 6>5

- 235
- 2
- 6
-
*oops* I overlooked that CDATA was already mentioned in the previous answer – Elvith Aug 06 '10 at 17:21
-
You made it very clear what I needed to do, so I appreciate that, regardless of whether it was already mentioned. I ended up using your solution for a quick fix, but the best practice would probably be to use XMLWriter has Artefacto mentioned, so I'm giving the best answer to him. – JayD3e Aug 06 '10 at 17:29
-
+1 for CDATA (but be careful, XML parsers can be set up to leave CDATA blocks out of the parsed tree) – GordonM Feb 13 '13 at 17:12
Try this:
$str = htmlentities($str,ENT_QUOTES,'UTF-8');
So, after filtering your data using htmlentities()
function, you can use the data in XML tag like:
<mytag>$str</mytag>

- 21,122
- 10
- 69
- 105

- 1,342
- 13
- 16
-
-
After filtering your data using htmlentities function, you can use the data in XML tag like
$str – Mosiur Jan 02 '14 at 14:11
If at all possible, its always a good idea to create your XML using the XML classes rather than string manipulation - one of the benefits being that the classes will automatically escape characters as needed.

- 8,090
- 8
- 31
- 37
Adding this in case it helps someone.
As I am working with Japanese characters, encoding has also been set appropriately. However, from time to time, I find that htmlentities
and htmlspecialchars
are not sufficient.
Some user inputs contain special characters that are not stripped by the above functions. In those cases I have to do this:
preg_replace('/[\x00-\x1f]/','',htmlspecialchars($string))
This will also remove certain xml-unsafe
control characters like Null character
or EOT
. You can use this table to determine which characters you wish to omit.

- 2,806
- 2
- 29
- 45
I prefer the way Golang does quote escaping for XML (and a few extras like newline escaping, and escaping some other characters), so I have ported its XML escape function to PHP below
function isInCharacterRange(int $r): bool {
return $r == 0x09 ||
$r == 0x0A ||
$r == 0x0D ||
$r >= 0x20 && $r <= 0xDF77 ||
$r >= 0xE000 && $r <= 0xFFFD ||
$r >= 0x10000 && $r <= 0x10FFFF;
}
function xml(string $s, bool $escapeNewline = true): string {
$w = '';
$Last = 0;
$l = strlen($s);
$i = 0;
while ($i < $l) {
$r = mb_substr(substr($s, $i), 0, 1);
$Width = strlen($r);
$i += $Width;
switch ($r) {
case '"':
$esc = '"';
break;
case "'":
$esc = ''';
break;
case '&':
$esc = '&';
break;
case '<':
$esc = '<';
break;
case '>':
$esc = '>';
break;
case "\t":
$esc = '	';
break;
case "\n":
if (!$escapeNewline) {
continue 2;
}
$esc = '
';
break;
case "\r":
$esc = '
';
break;
default:
if (!isInCharacterRange(mb_ord($r)) || (mb_ord($r) === 0xFFFD && $Width === 1)) {
$esc = "\u{FFFD}";
break;
}
continue 2;
}
$w .= substr($s, $Last, $i - $Last - $Width) . $esc;
$Last = $i;
}
$w .= substr($s, $Last);
return $w;
}
Note you'll need at least PHP7.2 because of the mb_ord
usage, or you'll have to swap it out for another polyfill, but these functions are working great for us!
For anyone curious, here is the relevant Go source https://golang.org/src/encoding/xml/xml.go?s=44219:44263#L1887

- 8,155
- 11
- 57
- 93