0

I have two forms on two different pages which are used to insert data to an MySQL database. I have some special character like 'čšžćđ' in my form data which I pass via the forms to the insertion scripts.

The data from the first form gets inserted correctly, while some fields from the second form contain the '?' characters, which would indicate a mismatch in encoding.

The two insertion scripts of both the forms are using the same file to connect to the database and set the encoding, like below:

<?php

$username = "root";
$password = "";
$servername = "localhost";

$conn = mysqli_connect($servername, $username, $password);
mysqli_select_db($conn, "testdb");

if (!$conn) {  // check if connected
    die("Connection failed: " . mysqli_connect_error());
    exit();
}else{

/* change character set to utf8 */
if (!mysqli_set_charset($conn, "utf8")) {
   // printf("Error loading character set utf8: %s\n", mysqli_error($conn));
} else {
   // printf("Current character set: %s\n", mysqli_character_set_name($conn));
}

mysqli_select_db($conn, "testdb");
//echo "Connected successfully.";


  // Check if the correct db is selected
  if ($result = mysqli_query($conn, "SELECT DATABASE()")) {
      $row = mysqli_fetch_row($result);
      //printf("\nDefault database is %s.\n", $row[0]);
      mysqli_free_result($result);
  }
}
?>

I guess this would mean, that the client character encoding isn't set correctly? All database tables have the utf_8 encoding set.

TheAptKid
  • 1,559
  • 3
  • 25
  • 47
  • add `header('Content-Type: text/html; charset=utf-8');` to set page to utf-8 – Saty Jun 17 '16 at 13:26
  • http://stackoverflow.com/questions/279170/utf-8-all-the-way-through/279279 < this is pretty much the defacto canonical answer on character encoding issues on SO – CD001 Jun 17 '16 at 13:29

2 Answers2

2

Try to set encoding on top of the page

<?php 

header('Content-Type: text/html; charset=utf-8');

other code...
2

Are you talking about HTML forms? If so,

<form accept-charset="UTF-8">

Is it one ? per accented character? When trying to use utf8/utf8mb4, if you see Question Marks (regular ones, not black diamonds),

  • The bytes to be stored are not encoded as utf8. Fix this.
  • The column in the database is CHARACTER SET utf8 (or utf8mb4). Fix this.
  • Also, check that the connection during reading is utf8.

The data was probably converted to ?, hence cannot be recovered from the text.

SELECT col, HEX(col) FROM ... to see what got stored.

  • ? is 3F in hex.
  • Accented European letters will have two hex bytes per character. That includes each of čšžćđ.
  • Chinese, Japanese, and Korean will (mostly) have three hex bytes per character.
  • Four hex characters would indicate "double encoding".
Rick James
  • 135,179
  • 13
  • 127
  • 222