0

Connect Microsoft Access database via odbc_connect in PHP like this:

$conn = odbc_connect("Driver={Microsoft Access Driver (*.mdb)};Dbq=mydatabase.mdb;Client_CSet=UTF-8;Server_CSet=UTF-8", "", "");

Execute the query like this:

odbc_exec($conn, "INSERT INTO mytable (id, name) VALUES ('1', 'Звезда')");

Open database in Microsoft Access. The Russian letters are displayed like shown below:

---------------------
id | name
---------------------
1  | Звезда

PHP-file is in UTF-8 encoding. So the word "Звезда" has UTF-8 representation. Windows system locale for non-unicode programs is Windows-1251.

How to fix encoding when connecting via odbc?

NoSkill
  • 718
  • 5
  • 15
  • The word "Звезда" in UTF-8 encoding is `\xd0\x97\xd0\xb2\xd0\xb5\xd0\xb7\xd0\xb4\xd0\xb0` but these bytes interpreted in code page _Windows-1251_ give the mojibake. I can't give a `php` proof however `"Звезда".encode('utf-8').decode('cp1251')` in `python` returns that weird mojibake string `'Звезда'`… – JosefZ Oct 21 '20 at 21:09
  • @JosefZ It's because my system locale for non-unicode programs is `Windows-1251`. So `odbc` simply interprets `UTF-8` encoded string `Звезда` as `Звезда` because my system locale for non-unicode programs is `Windows-1251`. And then this wierd string is stored to MS Access database, and MS Access interprets `Звезда` as `UNICODE` symbols. – NoSkill Oct 22 '20 at 12:48
  • I can confirm that Microsoft Access files are encoded in whatever Windows codepage was the default in the computer that created the file. As far as I know you need to `iconv()` or `mb_convert_encoding()` all around from/to UTF-8. – Álvaro González Oct 22 '20 at 17:36
  • @ÁlvaroGonzález Can you please explain how to solve this issue using encoding conversion? It seems that ODBC driver (or connection type) is not Unicode. How to send UTF-8 encoded data through non-unicode driver/connection? – NoSkill Oct 24 '20 at 14:01
  • You need to figure out the Access encoding (you already suggest it's probably Windows-1251) and encode *from* UTF-8 the text you send and encode *to* UTF-8 the text you receive, with the aforementioned functions. – Álvaro González Oct 24 '20 at 15:25
  • @ÁlvaroGonzález Starting with `Office 2000`, `Access` is using `UTF-16LE` (but it can handle `UTF-8` data too if a connection use `UTF-8` char-sets). `PHP` uses `UTF-8` in binary mode. The connection between `Access` and `PHP` via `ODBC` is probably `non-Unicode`, and uses `ANSI` char-set of current system locale (for me it's `Windows-1251`). So my question is how to fix encoding when connecting via `ODBC`, to send `Unicode` symbols to `Access`? – NoSkill Oct 24 '20 at 17:55
  • Sorry if I wasn't being precise (my only experience with Access from PHP is via ODBC). You can do `mb_convert_encoding($row['value'], 'UTF-8', 'Windows-1252);` when reading from DB and `mb_convert_encoding($value, 'Windows-1251', 'UTF-8')` when inserting or updating into DB. – Álvaro González Oct 24 '20 at 18:36
  • @Álvaro González But this is not unicode solution!!! This method will work only for `Cyrillic` letters that is present in `Windows-1251`! How about storing this string: `αβγδАБВГҐґԱԲაბგვⴀⴁⴂⴃ驚いた彼`? – NoSkill Oct 24 '20 at 20:04
  • [Windows-1251](https://en.wikipedia.org/wiki/Windows-1251) contains Latin and Cyrillic script, it cannot possibly encode Georgian, Chinese or Japanese. I understand you dislike this workaround (I do too), I'm only sharing my experience with Microsoft ODBC drivers (which if I'm not wrong are basically abandonware at present time). Perhaps you can find third-party drivers or try another library—I don't have experience to share about that. – Álvaro González Oct 25 '20 at 08:33
  • 1
    Found some workaround today... If i turn on `Use Unicode UTF-8 for worldwide language support ` in `Windows 10` `Region Settings` I able to store multilingual strings like `αβγδАБВГҐґԱԲაბგვⴀⴁⴂⴃ驚いた彼` via `PHP-ODBC` with no problem. But it still not a solution, it's just workaround. In this case Windows 10 simply uses Wide-String API for non-Unicode programs. – NoSkill Oct 25 '20 at 13:20

1 Answers1

0

PHP-ODBC is not Unicode driver. So PHP is treating the connection as ASCII-only. You not able to send Unicode characters to a database via PHP-ODBC.

The only workaround is to turn on the option "Use Unicode UTF-8 for worldwide language support" in Windows 10 Region Settings (Note: this experimental feature is not available prior Windows 10). In this case Windows 10 uses Wide-String API for non-Unicode programs.

NoSkill
  • 718
  • 5
  • 15