8

Would anyone know of a reliable method (with mySQL or otherwise) to select rows in a database that contain Japanese characters? I have a lot of rows in my database, some of which only have alphanumeric characters, some of which have Japanese characters.

Rio
  • 14,182
  • 21
  • 67
  • 107
  • a simple `SELECT` does the job, so can you describe what your table is and what you're doing after the `SELECT` – Teneff Mar 19 '11 at 06:28
  • Is there any way to select a row that contains any Japanese character? – Rio Mar 19 '11 at 23:01
  • Can you give us the output from `SHOW CREATE TABLE table_name`, where `table_name` is the name of the table in question? Based on what is there, you may or may not be able to do this. – Charles Mar 20 '11 at 02:16
  • Japanese speaker here. In Japanese, while there are only 71 kana characters, there are over 50,000 kanji characters. Therefore it probably is not feasible to write a `SELECT` statement which matches over 50,000 different Japanese characters. – Leo Galleguillos Feb 04 '20 at 00:52

5 Answers5

10

Rules when you have problem with character sets:

  1. While creating database use utf8 encoding:

    CREATE DATABASE  _test DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
    
  2. Make sure all text fields (varchar and text) are using UTF-8:

    CREATE TABLE _test.test (
      id INT NOT NULL AUTO_INCREMENT,
      name VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE = MyISAM;
    
  3. When you make a connection do this before you query/update the database:

    SET NAMES utf8;
    
  4. With phpMyAdmin - Choose UTF-8 when you login.

  5. set web page encoding to utf-8 to make sure all post/get data will be in UTF-8 (or you'll have to since converting is painful..). PHP code (first line in the php file or at least before any output):

    header('Content-Type: text/html; charset=UTF-8');
    
  6. Make sure all your queries are written in UTF8 encoding. If using PHP:

6.1. If PHP supports code in UTF-8 - just write your files in UTF-8.

6.2. If php is compiled without UTF-8 support - convert your strings to UTF-8 like this:

    $str = mb_convert_encoding($str, 'UTF-8', '<put your file encoding here');
    $query = 'SELECT * FROM test WHERE name = "' . $str . '"';

That should make it work.

Channa
  • 742
  • 17
  • 28
NickSoft
  • 3,215
  • 5
  • 27
  • 48
2

Following on to the helpful answer NickSoft, i had to set the encoding on the db connection to get it to work.

&characterEncoding=UTF8

Then the SET NAMES utf8; seemed to be redundant

mart
  • 21
  • 1
0

As teneff stated, just use SELECT.

When installing MySQL, use UTF-8 as charset. Then, choosing utf8_general_ci as collation should do the work.

Maxime Pacary
  • 22,336
  • 11
  • 85
  • 113
0

As Frosty stated, just use SELECT.

Look up the lowest and highest valued Japanese characters in the Unicode charts at http://www.unicode.org/roadmaps/bmp/ and use REGEXP. It may use several different regions of characters to get the whole Japanese character set. As long as you use the UTF-8 charset and utf8_general_ci collation, you should be able to use a REGEXP '[a-gk-nt-z]' where a-g represents one range of Unicode characters from the charts, k-n represents another range, etc.

penguin359
  • 1,289
  • 13
  • 26
  • Actually, I think you'll want the utf8_bin collation method. This will compare the them byte-wise. In UTF-8, that will mean a higher Unicode value will always compare later. This way you can just look up the various ranges of Japanese characters in the Unicode charts and use the first and last character in each range inside []. – penguin359 Apr 01 '11 at 10:23
-1

There is limited number of japanese characters. You can search for these using

SELECT ... LIKE '%カ%'

Alternatively you can try their hexadecimal denomination -

SELECT ...LIKE CONCAT('%',CHAR(0x30ab),'%')

You may find useful this UTF-8 Japanese subset http://www.utf8-chartable.de/unicode-utf8-table.pl?start=12448

Supposing you're using UTF-8 character set for fields, queries, results...

jhavrda
  • 403
  • 2
  • 13
  • Is there any way to select a row that contains any Japanese character? – Rio Mar 19 '11 at 23:01
  • 1
    You might combine all characters in one query (might be slow) `code`SELECT ... LIKE '%カ%' OR LIKE '%ア%' Worth trying is regular expression `code`SELECT ... REGEXP '[カア]' – jhavrda Mar 21 '11 at 08:21
  • `like '%xxx%'` and `regexp` are equally slow, because they don't use index. In both cases it's a complete table scan – NickSoft Apr 03 '11 at 14:23
  • Japanese speaker here. In Japanese, while there are only 71 kana characters, there are over 50,000 kanji characters. Therefore it probably is not feasible to write a `SELECT` statement which matches over 50,000 different Japanese characters. – Leo Galleguillos Feb 04 '20 at 00:50