1

I have spent the entire day trying to figure this out, I hope you can help

PROBLEM: inserting "Cosío" into mysql database

what happens is that the string gets cut at the accent so it only inserts "Cos"

if I do the following:echo mb_detect_encoding($_POS['name'], "auto"); it show UTF-8

Reading some post here and there i followed some of the advice and did the following

mysql database : collation = utf8_general_ci
mysql table: collation = utf8_general_ci
mysql field: collation = utf8_general_ci

I am using codeigniter framework and my database conection is a follows:

  $active_group = 'default';
$query_builder = TRUE;

$db['default'] = array(
    'dsn'   => '',
    'hostname' => 'localhost',
    'username' => 'root',
    'password' => '',
    'database' => 'codeigniter',
    'dbdriver' => 'mysqli',
    'dbprefix' => '',
    'pconnect' => FALSE,
    'db_debug' => (ENVIRONMENT !== 'production'),
    'cache_on' => FALSE,
    'cachedir' => '',
    'char_set' => 'utf8',
    'dbcollat' => 'utf8_general_ci',
    'swap_pre' => '',
    'encrypt' => FALSE,
    'compress' => FALSE,
    'stricton' => false,
    'failover' => array(),
    'save_queries' => TRUE
);

on apache config i also added AddDefaultCharset utf-8

also declared html tag <meta charset="UTF-8">

and i have read and read several SO post but with no success. What am I missing?

UPDATE: I am getting closer to the problem, before doing the insert query i am sanitizing all post variables like so.

$ready_for_insert = ucwords(strtolower(filter_var($_POST['name'], FILTER_SANITIZE_STRING)));

if i remove filter sanitize string, everything works good. I do this to clean the string of any tags or malicius input i dont know if I should remove it.

AL DI
  • 560
  • 6
  • 24

1 Answers1

2

Full Unicodes are not supported by MySQL's utf-8 encoding.

In your example Cosío, í is not supported and properly encoded before saving into the database. So the characters after that unicode are being stripped off by MySQL. Only Cos will be saved in the database.

If you want to support full unicodes, you have to switch the encoding to utf8mb4

This could be done as below.

For each database:

ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

For each table:

ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

For each column:

ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

(Don’t blindly copy-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a VARCHAR column.)

Please refer this link for more info

yuvanesh
  • 1,093
  • 2
  • 9
  • 19