2

My DB has some text which is probably copied and pasted from word document contains some curly quotes and curly apostrophes. PHP code is generating XML data/document with this text as one of its XML element.

This is the error I'm getting when I try to display the XMl doc

This page contains the following errors:

error on line 40 at column 1: Encoding error

Below is a rendering of the page up to the first error.

I've tried as mentioned in the post here, but it isn't working for me. Also tried

$output = iconv('UTF-8', 'ASCII//TRANSLIT', $input);

as mentioned here. This one displays the text till curly quotes or apostrophe appears. Do I need to mention any different character output format here?

Is there any function available in PHP to handle these type of special characters when generating XML document. I am using <?xml version="1.0" encoding="utf-8"?> character encoding for XML documnet

Here is some of my code

header('Content-type: text/xml');
echo '<?xml version="1.0" encoding="utf-8"?>';

$item = mysql_fetch_object($result);
<listitems>
    <item>
        <name><?=htmlspecialchars(stripslashes($item->name))?></name>
        <details><?=htmlspecialchars(stripslashes($item->details))?></details>
        .
        .
        .
        .

    </item>
</listitems>
Community
  • 1
  • 1
S K
  • 177
  • 1
  • 2
  • 10

2 Answers2

3

on the table it says DEFAULT CHARSET=latin1

It could be that you are fetching ISO-8859-1 data and outputting it as UTF-8. That would result in invalid characters beyond the 128 basic ASCII characters.

Try this iconv():

$output = iconv('ISO-8859-1', 'UTF-8//TRANSLIT', $input);

Improv
  • 178
  • 1
  • 6
Pekka
  • 442,112
  • 142
  • 972
  • 1,088
  • Thanks Pekka, it works:) just want to know, can we write like this iconv('latin1', 'UTF-8', $appObj->description); as in the table it is mentioned as latin1. just want to know better about it – S K Jan 09 '12 at 09:06
  • 1
    “Latin1” could also refer to Windows Latin 1, windows-1252. In fact, ISO-8859-1 does not contain curly apostrophes or curly quotes, though it is not uncommon that programs effectively treat ISO-8859-1 as windows-1252. – Jukka K. Korpela Jan 09 '12 at 11:06
  • @Jukka ah, that detail escaped me - good to know, thanks! [mySQL's latin1 seems equivalent to Windows-1252 rather than ISO-8559-1](http://dev.mysql.com/doc/refman/5.0/en/charset-we-sets.html). But this @S K, would mean that you might need to use `windows-1252` as the first parameter to iconv if the curly quotes don't work out. – Pekka Jan 09 '12 at 11:25
0

Try to wrap the text-nodes that contain the curly apostrophes in CDATA blocks like this:

<text><![CDATA[This is my test´s text]]></text>

That way you prevent xml viewers from parsing that text and it gets rendered correctly.

bardiir
  • 14,556
  • 9
  • 41
  • 66
  • It displays data but ended up showing only some of the data. Just stopped in the middle. – S K Jan 09 '12 at 08:07