3

I am working on a turkish website, which has stored many malformed turkish characters in a MySQL database, like:

 - ş as þ
 - ı as ý
 - ğ as ð
 - Ý as İ

i can not change the data in the database, because the database are updated daily and the new data will contain the malformed characters again. So my idea was to change the data in PHP instead of changing the data in the database. I have tried some steps:

Turkish characters are not displayed correctly

Fix Turkish Charset Issue Html / PHP (iconv?)

PHP Turkish Language displaying issue

PHP MYSQL encoding issue ( Turkish Characters )

I am using the PHP-MySQLi-Database-Class available on GitHub with utf8 as charset.

I have even tried to replace the malformed characters with str_replace, like:

$newString = str_replace ( chr ( 253 ), "ı", $newString );

My question is, how can i solve the issue without changing the characters in the database? Are there any best practices? Is it a good option just to replace the characters?

EDIT: solved it by using

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-9" />
Community
  • 1
  • 1
Doğan Uçar
  • 166
  • 3
  • 15
  • Have you identified which wrong character set was used to write the values into the database? You could try to set your `character_set_client` to the wrong character set, read the data, then write the data back using the correct one. – 0xCAFEBABE Nov 25 '15 at 16:44
  • do you mean select, convert and insert in the tables? – Doğan Uçar Nov 25 '15 at 16:52
  • Judgind by what you describe, the encoding was wrong when somebody inserted data into the database. The most consistent way to rectify that would be to use the same encoding to read the wrongfully encoded data back, then overwrite it into the database with the correct encoding. – 0xCAFEBABE Nov 26 '15 at 08:22
  • thank you. But currently i resolved it by changing the meta charset HTML tag to iso-8859-9. – Doğan Uçar Nov 30 '15 at 21:28
  • Please provide `SELECT col, hex(col) FROM tbl WHERE ...` so we can see what has been stored. Also, `SHOW CREATE TABLE` – Rick James Dec 11 '15 at 00:47

3 Answers3

2

2022 update. I made a wide research and I found this solution and it's working. let's say your db_connection is $mysqli:

$mysqli = mysqli_connect($hostname, $username, $password, $database) OR DIE ("Baglanti saglanamadi!");

just add this line after. it works like magic with all languages even Arabic:

mysqli_set_charset($mysqli, 'utf8');
AMN
  • 21
  • 3
0

Two solutions are good

PHP MYSQL encoding issue ( Turkish Characters )

PHP Turkish Language displaying issue

Also you can set configuration on phpMyAdmin

Operations > Table options > Collation > select utf8_general_ci

if you create the tables already edit the collation structures also

Community
  • 1
  • 1
Fahed Alkaabi
  • 269
  • 2
  • 10
0

SELECT CONVERT(CONVERT(UNHEX('d0dddef0fdfe') USING ...) USING utf8);

latin5 / iso-8859-1 shows ĞİŞğış
latin1 / iso-8859-9 shows ÐÝÞðýþ

You are confusing two similar encodings; see the first paragraph in https://en.wikipedia.org/wiki/ISO/IEC_8859-9 .

"Collation" is only for sorting. But first you need to change the CHARACTER SET to latin5. Then change the collation to latin5_turkish_ci. (Since that is the default for latin5, no action need be taken.)

This may suffice to make the change in MySQL: EDIT 3

NO, this is probably wring -- ALTER TABLE tbl CONVERT TO CHARACTER SET latin5;

After seeing more of the issue, this "2-step ALTER" is probably correct:

ALTER TABLE Tbl MODIFY COLUMN col VARBINARY(...) ...;
ALTER TABLE Tbl MODIFY COLUMN col VARCHAR(...) ... CHARACTER SET latin5 ...;

Do that for each table. Be sure to test this on a copy of your data first.

The 2-step ALTER is useful for when the bytes are correct, but the CHARACTER SET is not.

CONVERT TO should be used when the characters are correct, but you want a different encoding (and CHARACTER SET). See Case 5.

Edit 1

E7 and FD and cp1250, dec8, latin1 and latin2 for ç and ý. FD in latin5 is ı. I conclude that your encoding is latin1, not latin5.

You say you cannot change the "scripts". Let's look at your limitations. Are you restricted on the INSERT side? Or the SELECT side? Or both? What is rendering the text; html? MySQL is willing to change from latin1 to/from latin5 and you insert/select (based on a few settings). And/or you could lie to HTML (via a meta tag) to get it to interpret the bytes differently. Please spell out the details of the data flow.

Edit 2

Given that the HEX in the table is E7FD6B6172FD6C6D6173FD6E61, and it should be rendered as çıkarılmasına, ... Note especially the second letter needs to show as ı (Turkish dotless small I), not ý (small Y with acute), correct?

Start by trying

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-9"/>

That should give you the `latin5 rendering, as you already found out. IANA Reference.

As for "Best practice", that would involve changing the way text is inserted. You have stated this as off-limits.

Apparently you have latin5 characters stored in a latin1 column. Since latin1 does not involve any checking, you can insert and retrieve latin5 characters without any trouble.

This does not address the desire to have Turkish collation. If necessary, I can probably concoct a way to specify Turkish ordering on particular statements; please provide a sample statement.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • altering the table to latin5 fails. The error is: Error Code: 1366. Incorrect string value: '\xFEand\xFD....' for column 'caption' at row 1 currently, i am using meta charset HTML tag to iso-8859-9 in HTML and it is working.. but i dont know whether it is a good solution or not – Doğan Uçar Dec 29 '15 at 20:48
  • Hmmm... Let's backup and check something. Please perform `SELECT col, HEX(col) FROM tbl WHERE ...` to show something that is coming out 'wrong'. And specify what you expect that column to say for that row. – Rick James Dec 30 '15 at 00:13
  • Problem is, the values are bad encoded stored in the tables. I can only change the tables, but dont have any chance to change the scripts, which inserts the data... Example: col: 'Ekonomik krizin faturasýný kendilerine çýkarýlmasýna kýzan emekçiler, sokaða çýkarak tepki gösterdi.', hex(col):'456B6F6E6F6D696B206B72697A696E2066617475726173FD6EFD206B656E64696C6572696E6520E7FD6B6172FD6C6D6173FD6E61206BFD7A616E20656D656BE7696C65722C20736F6B61F06120E7FD6B6172616B207465706B692067F67374657264692E' – Doğan Uçar Dec 30 '15 at 23:42
  • i am restricted on the insert side. I dont have access to the scripts which inserts the data. On the select side i can do what i want. The text is rendered by HTML. that sounds interesting. So i should change now to latin1? what are the next steps? – Doğan Uçar Jan 04 '16 at 01:28
  • First, let me see `SHOW CREATE TABLE` so I know what character set the column is set to. – Rick James Jan 04 '16 at 04:16
  • hey, sorry my late reply. SHOW CREATE TABLE is too long for posting as a comment, is it enough to see this?: ENGINE=MyISAM AUTO_INCREMENT=48501 DEFAULT CHARSET=latin1 – Doğan Uçar Jan 08 '16 at 14:16
  • Hey Rick, so you are right. With the above stated meta tag i can render the turkish characters. But it is not possible to use the strings for example for json_encode(). How to solve this? – Doğan Uçar Jan 08 '16 at 14:20
  • JSON needs utf8, not latin1. – Rick James Jan 08 '16 at 18:06
  • The "meta" tag happens to make it work, but it is a kludge. – Rick James Jan 08 '16 at 18:17
  • I have included your problem in my charset tome: http://mysql.rjweb.org/doc.php/charcoll#turkish – Rick James Jan 08 '16 at 18:17
  • so and you mean, the only way to have a clean database is to change the way how it is inserted? thank you for including my problem. But what is happening now with the problem xD – Doğan Uçar Jan 08 '16 at 22:08
  • The "right" solution is to be entirely latin5 or entirely utf8 (utf8 also allows json to work). "Entirely" means the bytes are encoded that way _and_ the connection is that way _and_ the table (or at least the relevant columns) are that way _and_ the meta tag is that way. – Rick James Jan 08 '16 at 22:42
  • Currently, you are "lucky" that latin1 and latin5 don't really care what the bytes look like. So, the fact that you have mixed latin1 and latin5 does not "hurt"; the meta tag "saves the day". – Rick James Jan 08 '16 at 22:43
  • ok, thanks a lot Rick.. but i have one last question.. lets assume, i will change the way inserting the data and change the charset of the columns, tables and databases.. how can i change the bad encoded/formatted characters (how can i convert "çýkarýlmasýna" to "çıkarılmasına"?) – Doğan Uçar Jan 09 '16 at 22:44
  • Well, that's the beauty of it. latin1 (and cp1250, dec8, latin2) `ý` is encoded hex `FD`. latin5 `ı` is also encoded `FD`. That is, the _bytes_ in your table are correct; it's just the _interpretation_ of them that leads to the wrong "character". I think the correct way to change the column definition from `latin1` to `latin5`, without modifying the bytes, is to use the [2-step ALTER](http://mysql.rjweb.org/doc.php/charcoll#2_step_alter). – Rick James Jan 09 '16 at 22:57
  • I'm sorry this is taking so long. There are 4 steps to do to "do it right"; that means at least 15 combinations of how to "do it wrong". It is non-trivial to tease out what went wrong. And your case wrong using latin1 in some or all of the first 3 steps without impacting the result. The 4th step (meta tag) "fixed" the problem, but left both you and me perplexed as to what was going on. – Rick James Jan 09 '16 at 23:04
  • latin5 covers Turkish and English, but not much else. If you ever need more than those, you will need to move to utf8 (or utf8mb4). At that point, all 4 steps must be in sync to get the dotless `ı`, etc. – Rick James Jan 09 '16 at 23:06
  • I see that I gave you the wrong advice (by not knowing all 4 steps). See Edit 3. – Rick James Jan 09 '16 at 23:18