4

My charset in the database is set to utf8_unicode_ci, all files encoded in UTF8 (without BOM).

Here is my php code:

<?php
    require_once("./includes/config.php");

    $article = new Article();

    $fields = array(
        'status' => '0',
        'title' => 'מכבי ת"א אלופת אירופה בפעם ה-9',
        'shorttitle' => 'מכבי ת"א אלופת אירופה',
        'priority' => '1',
        'type' => '1',
        'category' => '2',
        'template' => '68',
        'author' => '1',
        'date' => date("Y-m-d H:i"),
        'lastupdate' => date("Y-m-d H:i"),
        'preview' => 'בלה בלה בלה',
        'content' => 'עוד קצת בלה בלה בלה',
        'tags' => 'מכבי ת"א,יורוליג,אליפות אירופה',
        'comments' => '1'
    );

    $article->set($fields);
    $article->save();

for some reason, the Hebrew characters appear like this in phpmyadmin:

מכבי ת"× ×לופת ×ירופה ×‘×¤×¢× ×”-9

Database connection code:

<?php
    final class Database
    {
        protected $fields;
        protected $con;

        public function __construct($host = "", $name = "", $username = "", $password = "")
        {
            if ($host == "")
            {
                global $config;

                $this->fields = array(
                    'dbhost' => $config['Database']['host'],
                    'dbname' => $config['Database']['name'],
                    'dbusername' => $config['Database']['username'],
                    'dbpassword' => $config['Database']['password']
                );

                $this->con = new mysqli($this->fields['dbhost'], $this->fields['dbusername'], $this->fields['dbpassword'], $this->fields['dbname']);

                if ($this->con->connect_errno > 0)
                    die("<b>Database connection error:</b> ".$this->con->connect_error);
            }
            else
            {
                $this->con = new mysqli($host, $username, $password, $name);

                if ($this->con->connect_errno > 0)
                    die("<b>Database connection error:</b> ".$this->con->connect_error);
            }
        }

Any ideas why?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Naxon
  • 1,354
  • 4
  • 20
  • 40

2 Answers2

4

You have set the database's and file's character set to UTF-8, but the data transfer between PHP and the database also needs to be set correctly.

You can do this using set_charset:

Sets the default character set to be used when sending data from and to the database server.

Add the following as last statement of your Database constructor:

$this->con->set_charset("utf8");

This will not fix the issue for the data that is already in the database, but for new data written to the database you should notice the difference.

If you decide to rebuild your database, then please consider using the superior utf8mb4 character set, as described in the MySql docs:

The character set named utf8 uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters:

  • For a BMP character, utf8 and utf8mb4 have identical storage characteristics: same code values, same encoding, same length.

  • For a supplementary character, utf8 cannot store the character at all, while utf8mb4 requires four bytes to store it. Since utf8 cannot store the character at all, you do not have any supplementary characters in utf8 columns and you need not worry about converting characters or losing data when upgrading utf8 data from older versions of MySQL.

utf8mb4 is a superset of utf8

Community
  • 1
  • 1
trincot
  • 317,000
  • 35
  • 244
  • 286
2

It's important that your entire line code has the same charset to avoid issues where characters displays incorrectly.

There are a few settings that needs to be properly defined and I'd strongly recommend UTF-8, as this has most letters you would need (Hebrew), but also supports a wide variety of other charsets too (Scandinavian, Greek, Arabic).

Here's a little list of things that has to be set to a specific charset.

Headers

Setting the charset in both HTML and PHP headers to UTF-8

  • PHP: header('Content-Type: text/html; charset=utf-8');
    (PHP headers has to be placed before any kind output (echo, whitespace, HTML))

  • HTML: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    (HTML-headers are placed within the <head> / </head> tag)

Connection

You also need to specify the charset in the connection itself (placed directly after creating the connection).

$this->con->set_charset("utf8");

Database and tables

Your database and all its tables has to be set to UTF-8. Note that charset is not exactly the same as collation (see this post).

You can do that by running the queries below once for each database and tables (for example in phpMyAdmin)

ALTER DATABASE yourDatabase CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE yourTable CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;

Other

  • Some specific functions have the attribute of a specific charset, and if you are using such functions, it should be specified there as well
  • It may be that you already have values in your database that are not encoded with UTF-8. Updating them manually could be a pain and could consume a lot of time. Should this be the case, you could use something like ForceUTF8 and loop through your databases, updating the fields with that function.

Should you follow all of the pointers above, chances are your problem will be solved. If not, you can take a look at this StackOverflow post: UTF-8 all the way through.

Community
  • 1
  • 1
Qirel
  • 25,449
  • 7
  • 45
  • 62