-1

I have this little piece of code, reading a very easy affiliate XML. The URL is http://pf.tradetracker.net/?aid=193104&encoding=iso-8859-1&type=xml-v2&fid=556821&r=tt-canvasholidays.nl&categoryType=2&additionalType=2

When I try to put this XML in an array and process it with some basic functions, it seems that special characters are being replaced by encoded ones.

For example the word Château in the original XML is replaced by Château when I read the XML in my php as follows:

$xml = simplexml_load_file($file);
foreach($xml->product as $child) {
  $name = mysql_real_escape_string($child->name);
  }

Had programmed an escape function that replaces most of the Ãx-things with the correct characters, but the "â" string seems to be not accepted in the php code.

I have read some articles about wrong database conversion, but I'm just reading and interpreting it from the source XML and in the simplexml_load_file is no encoding specified.

O btw, I'm not a very experienced php programmer, but just stuck to Wordpress in combination with proven technologies for affiliate productfeeds, until some French data was coming through :)

Hope you can help... Thanks in advance...

rodney
  • 1
  • 2

1 Answers1

0

As you probably know, your XML document is in iso-8859-1 encoding. But I suspect you're outputting $name to a UTF-8 encoded page (although it doesn't explain the conversion of Château to Château).

Once you've resolved the character encoding issue, you shouldn't need to replace characters or try to fix them.

You can try one of the following (3 is best):

  1. Set the output encoding to iso-8859-1
  2. Convert the input XML to UTF-8 with iconv
  3. Request the XML document in UTF-8: http://pf.tradetracker.net/?aid=193104&encoding=utf-8&type=xml-v2&fid=556821&r=tt-canvasholidays.nl&categoryType=2&additionalType=2

If you're still having problems, then I suspect your source document is being downloaded incorrectly.

The following code works fine for me (Using http://phphttpclient.com/downloads/httpful.phar):

<?php
include('./httpful.phar');

$xml_string = \Httpful\Request::get('http://pf.tradetracker.net/?aid=193104&encoding=utf-8&type=xml-v2&fid=556821&r=tt-canvasholidays.nl&categoryType=2&additionalType=2')->send();
$xml = simplexml_load_string($xml_string);

header('Content-Type: text/html; charset=utf-8');

foreach($xml->product as $child) {
    echo $child->name . "<br>";
}

?>

Update: It seems that SimpleXML will convert output to UTF-8 automatically, presumably based on the encoding declaration of the XML.

Alastair McCormack
  • 26,573
  • 8
  • 77
  • 100
  • Thanks for your time. I tried utf-8 for the XML document too already, without any positive result. iconv didn't do anything to the result too... – rodney Apr 15 '15 at 06:25
  • Result of 3: tt_Canvasholidays.nl tt_Canvasholidays.nl_1_AR05S | Camping Château de Boisson | 0.00 | Ardèche => succeeded tt_Canvasholidays.nl_2_CE01X | Camping Val de Cantobre | 0.00 | Ardèche => succeeded – rodney Apr 15 '15 at 07:12
  • Result of 2 is the same. I added this code to my php: iconv("iso-8859-1", "utf-8//TRANSLIT",$xml); Even while step 3 was already implemented, so the original encoding of the XML-doc is utf-8. – rodney Apr 15 '15 at 07:16
  • Step 1 didn't influence the result either :( – rodney Apr 15 '15 at 07:18
  • Are you still running the result through your `mysql_real_escape_string()` function before printing the result? – Alastair McCormack Apr 15 '15 at 07:30
  • Hi @rodney, I've updated my answer to include a working example. You shouldn't need `mysql_real_escape_string()` once everything is working. – Alastair McCormack Apr 15 '15 at 08:22
  • I included your code as u specified, but still the output is "Camping Château de Boisson" – rodney Apr 15 '15 at 09:25
  • Make sure your browser is running in "auto" encoding mode or UTF-8 mode. Through my testing I managed to knock my browser into iso-8859-1, which was breaking the response – Alastair McCormack Apr 15 '15 at 09:31
  • Btw i "echo" it to have some sort of debug response. In the end, this data must be stored in a mysql database, but it is stored there like it is presented in the echo-output. When i add the line header('Content-Type: text/html; charset=utf-8'); the output is presented right: "Camping Château de Boisson" Your line for the header seems to be critical to present the output right. – rodney Apr 15 '15 at 09:32
  • The two critical parts are: getting the content in UTF-8 format before passing to SimpleXML and ensuring the browser knows the correct encoding. Now that you've got good UTF-8 strings, ensure that your database is created as UTF-8 and your connection is set as UTF-8. Good luck :) – Alastair McCormack Apr 15 '15 at 09:36
  • What do u think the encoding of the db column must be: utf8_bin or utf8_unicode_ci? – rodney Apr 15 '15 at 09:36
  • It's not my area of expertise. Try: http://stackoverflow.com/questions/5526334/what-effects-does-using-a-binary-collation-have – Alastair McCormack Apr 15 '15 at 09:40
  • Nice detail: just your header line for encoding did the work. When i removed all code tips, it is still presented right. Putting these strings into de database is still not working, with utf-8_bin or utf8_unicode_ci... :( – rodney Apr 15 '15 at 09:40
  • Your comments gave me some more understanding. Thanks for that. I will look into the other article. – rodney Apr 15 '15 at 09:42
  • Make sure your **connection** to your MySQL DB is also set to UTF-8. There's lots of SO questions on the subject and depends on which MySQL driver you're using. See: http://stackoverflow.com/a/4623748/1554386. Again, make sure that you set the `Content-Type` header and the connection encoding in the page that reads data from the DB – Alastair McCormack Apr 15 '15 at 10:01