0

I want get content of this Web Page using php functions (e.g file_put_contents, curl_init etc.) but I get ������� as response.

Why does that happen?

I solved my problem. after get content web page,I using mb_convert_encoding($body_webpage,"UTF-8","GBK") and now this is saveable in mysql whith chinese characters.

hediehloo
  • 1
  • 2

1 Answers1

0

There's a couple of things you need to do in order to get the Chinese page displayed correctly.

Tell PHP that we're using UTF-8 strings until the end of the script

mb_internal_encoding('UTF-8');

Tell PHP that we'll be outputting UTF-8 to the browser

mb_http_output('UTF-8');

Tell the bowser that we'll be using UTF-8 charset

header('Content-Type: text/xml; charset=UTF-8');


I've successfully loaded the page, with the correct character encoding, by using the code below:

<?php
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.chinanews.com/rss/scroll-news.xml");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_ENCODING, "");
$pagebody=curl_exec ($ch);
curl_close ($ch);

header('Content-Type: text/xml; charset=UTF-8');
echo $pagebody;
?>

Learn more about utf-8 character encoding at

https://phpbestpractices.org/#utf-8

Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
  • thank's Pedro Lobito. I want store $pagebody in mysql table with utf8_general_ci collection. But store filed is empty after execute insert query. I want store chinese character. – hediehloo May 14 '15 at 22:07
  • @hediehloo add MySQL as a tag, `SHOW CREATE TABLE`, and tell us whether you are using mysqli or PDO. – Rick James May 15 '15 at 04:53
  • Those two `mb` functions aren't really doing anything here. `mb_internal_encoding` just sets an internal value that other functions use, it doesn't itself do anything of importance here. `mb_http_output` is an output filter which automatically converts any output from the *internal encoding* to the given output encoding. Since both are UTF-8 nothing will happen. – deceze May 15 '15 at 07:47