For a few days now I've been looking for a solution to display UTF8 on my webpage. The character currently causing trouble is į (unicode: \u012f decimal: 303) however, there are over 10,000 records in my database and I cannot guarantee that all others are displaying correctly. So I'm looking for a solution that should cover all characters.
The į is displaying as a ? in the HTML.
My setup is a HTML page, which uses AJAX to send a request to a PHP file. The PHP then queries a MYSQL database to find a specific entry, it then takes a lithuanian word from that entry and echoes it as a response to AJAX. Back in the Javascript, the response is set as the innerHTML of a HTML element. This current setup is not using JQuery.
Below is my progress on attempting to fix the issue.
First, I verified that all files I was working with are correctly encoded to UTF8, not UTF8BOM.
Then I opened the MYSQL database in phpMyAdmin to view the entries. Seeing characters replaced with ? in the entries, I done some research and found the database had the wrong collation. After changing the collation to utf8_general_ci for the database/table nothing changed, so I looked into it further and found that changing it for individual columns of a table was another solution. This worked and my database is now displaying the characters correctly.
Next the character š (unicode: \u0161 decimal: 353) would not display in my webpage, I fixed this by using the following code in PHP which I found on stackoverflow.
function encode_string($string){
$encoded = "";
for ($n=0;$n<strlen($string);$n++){
$check = htmlentities($string[$n],ENT_QUOTES);
$string[$n] == $check ? $encoded .= "&#".ord($string[$n]).";" : $encoded .= $check;
}
return $encoded;
}
I can't say I completely understand this code but it caused the character š to display correctly when it got to my HTML. However this did not work for the character į.
I have also tried $conn->set_charset('utf8');
to set the connection to use utf8 however this resulted in į being displayed
as į instead, same result for $conn->query("SET NAMES UTF8;");
I have found that hardcoding the į into the Javascript or PHP, allow it to be sent back and displayed correctly, for example echo "į";
works.
So I believe the issue may be related to the database or in the PHP before the echo.
However I don't have the knowledge to identify the problem.
Here is my php code below:
<?php
header('Content-Type: text/html charset=utf-8');
//Connection to database is made. Referred to as $conn
$sql = "SELECT * FROM Words";
$result = $conn->query($sql);
if ($result->num_rows > 0) {
//Loop through the results to find a word with the status of 1
while($row = $result->fetch_assoc()) {
$status = $row["status"];
if($status == 1){
//respond to AJAX with the word
$ltword = trim($row["lt"]);
echo utf8_encode(encode_string($ltword));
//Has also been tested as
//echo encode_string($ltword);
//with no noticeable difference.
break;
}
}
}
function encode_string($string){
$encoded = "";
for ($n=0;$n<strlen($string);$n++){
$check = htmlentities($string[$n],ENT_QUOTES);
$string[$n] == $check ? $encoded .= "&#".ord($string[$n]).";" : $encoded .= $check;
}
return $encoded;
}
?>
At the core my question is, given my current setup, how do I correctly get an encoded UTF8 character from my database to display on my webpage?
EDIT:
The mb_check_encoding()
function of php, verifies that the data received from the database is valid utf8.
php.ini is using utf8 as it's default charset.
Using $conn->character_set_name();
returns the result latin1.
Using $conn->set_charset("utf8");
causes it return utf8, however į is then displayed as į which is still incorrect.