-1

So I'm trying to save Turkish characters into my database which normally runs with utf8_general_ci and UTF-8 on the website. But since this project from "Turkey" I cannot seem to succesfully save the characters entered by users on their website into our database.

It currently saves like this:

Kırıkkale
İstanbul

The code I'm using to convert characters in php before saving to database is:

iconv("ISO-8859-1", "UTF-8", $city);

In the website header I use:

<html lang="tr-TR"> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-9" />

Does anyone know how I can encode this properly to readable data? And is there any chance I can change the current stored data to readable data? Database preview

TTS
  • 47
  • 6
  • You need to set the charset of UTF8 throughout your application. Don't mix charsets, it'll only cause you headaches.. I have previously written [**an answer about encoding, a little checklist of sorts**](https://stackoverflow.com/questions/31897407/mysql-and-php-utf-8-with-cyrillic-characters/31899827#31899827) about dealing with this. There's also the much more in-depth [UTF8 All The Way Through](https://stackoverflow.com/questions/279170/utf-8-all-the-way-through) – Qirel Mar 15 '19 at 11:39
  • Ok thanks, I have followed the checklist. Any idea if it is possible to retrieve the current stored data into readable turkish characters data? For example reverse İstanbul to readable data: Istanbul ? – TTS Mar 15 '19 at 11:48
  • If the charset that its stored with is already mangled, its going to be hard. It's probably easier to just reset the data if you can. There are ways to do it, if you know the original encoding and that the data is not mangled beyond repair. – Qirel Mar 15 '19 at 11:48
  • Lots of tips on the internet for fixing mangled strings stored in MySQL, for example https://stackoverflow.com/q/29710565/318758 – Joni Mar 15 '19 at 12:18
  • So I tried these tips, but none of them seem to work. I have added an img of the database with the unreadable characters. – TTS Mar 15 '19 at 13:18
  • Did you already try to 'fix' the data? Your symptoms are not of the ordinary things like Mojibake, but rather a mess of 2, maybe 3 things going wrong. – Rick James Mar 16 '19 at 04:09

1 Answers1

1

Assuming you start with a dotted capital I, then Mojibake it 3 times, you can get İstanbul.

İ --> İ --> İ -> İ

In hex (for utf8), that is

C4B0 --> C384C2B0 --> C383E2809EC382C2B0 --> C383C692C3A2E282ACC5BEC383E2809AC382C2B0

For example, C4 B0 is the single character `İ` in utf8, but the 2 characters `İ` in latin1.

Mojibake occurs when one hand thinks the encoding is, say, utf8 while the other hand thinks it is, say, latin1.

For Turkish, you need to stay with UTF-8 (which MySQL calls utf8 or utf8mb4)

CONVERT(BINARY(CONVERT( CONVERT(BINARY(CONVERT('İ' USING latin1)) USING utf8mb4) USING latin1)) USING utf8mb4)

will turn İ back into İ. A third iteration should undo the mess you have.

Rick James
  • 135,179
  • 13
  • 127
  • 222