1

I have a question about the correct use of $mysqli->set_charset(). I haven't used this feature on my site for years. Now I'm rewriting my connection script, and want to apply $mysqli->set_charset() properly. At the moment the site is still based on 'latin1' (but I will soon switch to UTF-8 (utf8mb4).)

MySQLi on my server (which I manage myself) has been configured with latin1 for years. I assume it wouldn't hurt to add this $mysqli->set_charset("latin1") now?

And is it true that if MySQLi were configurated with utf8mb4 by default, without that $mysqli->set_charset() function it would be a party on my site with weird encoding characters?

I'd like to make sure about my assumption.

Wahyu Kristianto
  • 8,719
  • 6
  • 43
  • 68
  • Your assumptions are correct. Using `$mysqli->set_charset()` is a good practice for managing character sets and ensuring proper encoding of your data. – Wahyu Kristianto Mar 17 '23 at 19:48
  • First of all, you have to make a clear distinction between **MySQL** (a database) and **mysqli** (a php extension). Also, you have to make a clear distinction between PHP and Mysql. They don't make a solid "site" and can actually use different charsets. This is what set_charset is exactly for: to tell MySQL **what charset is used in PHP** so it can recode on the fly if a table is using a different charset. – Your Common Sense Mar 18 '23 at 07:48
  • Because yes, the encoding used by mysql at whole doesn't really matter, but only charset set for a *table* (or a column for that matter) really matters. While this table charset should reflect the actual charset used for the data in the table. – Your Common Sense Mar 18 '23 at 08:02
  • 1
    So technically you can have latin1 in PHP and utf-8 in database or vice-versa. Though of course it's better to have utf-8 on both ends – Your Common Sense Mar 18 '23 at 08:04

2 Answers2

2

mysqli::set_charset() set's the connection's charset, which is "all the strings that I send through this connection will be using this encoding, and I expect that encoding back as well". You need to match this to the encoding that you are using on the PHP side.

That said, even if the current setting is wrong you may wind up with broken data if you change it from its current value. This is because in some situations the data that gets mangled in transit from your to your DB will get un-mangled in the same way so long as the settings are consistent.

Before you make any changes you need to determine what encodings are currently in use, and if the data in your DB is mangled. From there you can make a path to ensuring that all the encodings match, that your data is correctly encoded and handle at all steps, as well as fixing your existing data.

As always, refer to the masterpost: UTF-8 all the way through

Extra thoughts:

  • String encoding is generally not detectable, it is metadata that must be tracked separately.
  • latin1 is actually ISO-8859-1, but beware it's evil twin cp1252 which stuffs in extra symbols in the reserver 8X and 9X byte ranges, notably €. ref
Sammitch
  • 30,782
  • 7
  • 50
  • 77
1

It's quite simple. Set mysqli_set_charset() to the value you expect your data to be encoded in. So if the data in your table's columns is stored in utf8mb4 then use that charset for the connection.

You cannot set the default charset in mysqli. The mysqli extension will actually use the default value that MySQL provides for its clients. This is why it's recommended to always set the charset using mysqli_set_charset(). Unless you are dealing with some legacy database, always set your charset to utf8mb4 which covers the widest range of characters.

Dharman
  • 30,962
  • 25
  • 85
  • 135
  • "to the value you expect" is true but "if the data in your table's columns is stored in utf8mb4 then use that charset for the connection" is not quite. They can have utf8mb4 on the database and latin1 on the site, and in this case mysqli_set_charset() should take latin1. And mysql will recode on the fly wherever possible based on this correct charset. In short, mysqli_set_charset should set the actual encoding used in PHP, not the encoding used in mysql. So if one *expects* latin1 from mysql, it should be set so. – Your Common Sense Mar 18 '23 at 07:42