0

I'm currently parsing an RSS feed and subparsing the html in the description field in order to create a custom XML structure.

In the description field there are ‘ and ’ signs and PHP outputs them as regular question marks. How come?

I've tried different encodings like UTF-8 and iso-8859-1 but nothing works..

This is the xml I'm parsing http://www.ilovetechno.be/artists_rss.xml

This is how it should get parsed http://www.crowdsurferapp.com/clients/ilovetechno/

Bundy
  • 311
  • 7
  • 25
  • Take a look at this question and my answer to it: http://stackoverflow.com/questions/910793/php-detect-encoding-and-make-everything-utf-8 – Gumbo Sep 25 '09 at 09:43

3 Answers3

3

There is a predefined order in that the encoding of a XML document is to be determined:

  1. charset parameter in the HTTP header field Content-Type:

    Content-Type: application/xml; charset=<character encoding>
  2. encoding attribute in the XML declaration:

    <?xml version="1.0" encoding="<character encoding>"?>

If both are missing, the default character encoding (UTF-8 or UTF-16) is used.

So in order to parse the XML document with the proper encoding, you need to look for those information. Take a look at the question PHP: Detect encoding and make everything UTF-8 for a solution from me.

I also recommend you to use UTF-8 for your internal processing and as the output encoding since that is one of the default character encodings for XML.

Community
  • 1
  • 1
Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • 1
    … you missed 3.: Byte order mark. Not sure 2. or 3. has precedence, though. – Konrad Rudolph Sep 25 '09 at 11:15
  • @Konrad Rudolph: You’re right. But I think it’s rather used to choose between the two default encodings if none of the above is present. – Gumbo Sep 25 '09 at 11:27
0

you also have to set the correct encoding in your html meta tags and/or in your http headers

knittl
  • 246,190
  • 53
  • 318
  • 364
0
<?xml version="1.0" encoding="iso-8859-1"?> 

change to utf-8.

Vladislav Rastrusny
  • 29,378
  • 23
  • 95
  • 156