0

Hi when I parsing web using PHP the other language sites like chinese japanese etc.. I am getting wired characters. so How can I preserve its original form while storing it in MySQL??

technologies which I have used is

dom document

json_decode
hakre
  • 193,403
  • 52
  • 435
  • 836
aron n
  • 599
  • 3
  • 9
  • 18
  • You have to use the same character encoding throughout the entire pipeline, e.g. utf-8 everywhere. – Marc B Jan 17 '11 at 04:59
  • @Marc: it almost means converting locale encoded pages, e.g. SJIS to UTF-8, and many places you will have to guess the encoding as it doesn't need to be specified for the default locale. – Steve-o Jan 17 '11 at 06:17

1 Answers1

1

MySQL can store data in Unicode. If you set the MySQL database, table and column encodings to UTF-8 you can store the foreign language in there correctly. You should send a 'SET NAMES utf8' command to MySQL when connecting to it via PHP each time too, to make sure it knows you're sending data in UTF-8.

A bigger problem is with PHP, which still has no real Unicode support. That seems kind of unbelievable for a major web development language in 2011 to me, but there you go. It'll work OK if all you do in PHP is get/update/display the Unicode data from MySQL, but you'll run into problems if you try do much string manipulation with it.

Data sent via ajax (e.g. your JSON data) also has to be sent UTF-8 encoded, so you need a <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> on each of your HTML pages too.

Michael Low
  • 24,276
  • 16
  • 82
  • 119
  • 1
    The correct term is setting the collation, and it may be best to set the collation on the table to the language you are using because of sorting accuracy. More information here: http://stackoverflow.com/questions/367711/what-is-the-best-collation-to-use-for-mysql-with-php – Andy Chase Sep 21 '12 at 18:52
  • -1 - `SET NAMES` should not be suggested firsthand; PHP has Unicode support. – hakre Sep 24 '12 at 07:56
  • Ouch. 1) Yes SET NAMES is deprecated now, but it's the simplest way using PDO and old version of PHP which didn't support charset in conn string (this answer is coming up to 2 years old after all). 2) PHP still doesn't have full, native string Unicode support in the sense that other languages do - http://stackoverflow.com/questions/571694/what-factors-make-php-unicode-incompatible . Getting the full Unicode support is a long-stated goal of PHP 6. – Michael Low Sep 24 '12 at 10:46