6

I have used utf8mb4 as the encoding and character_set_connection and character_set_database have been set as utf8mb4. The charset in my webpages is set to utf8. I have used PDO and when I open the connection to the database I use utf8mb4 as the charset. The columns in the tables also have utf8mb4_unicode_ci as collation. The storage engine used is MyISAM.

The difference occurs because I can not edit character_set_server(since I am on shared hosting), it is set by default as utf8 and the collation_server by default is utf8_unicode_ui. I cannot change these by any means. It worked fine on localhost with similar character variables.

The connection variable is $conn which is opened as
$db_server = 'localhost'; $db_name = 'userdb'; $db_encoding = 'utf8mb4'; $db_user = 'test'; $db_pass = 'password'; $conn = new PDO('mysql:host='.$db_server.';dbname='.$db_name.';charset='.$db_encoding,$db_user,$db_pass);
The form for taking input is as simple as
<form action="" method="POST" accept-charset="UTF-8"><input type="text" class="form-control" name="inp"></form>
And the input is processed as
$test = $_POST['inp']; $it = $conn->prepare("INSERT INTO tbl_test(message) VALUES(?)"); $it->bindParam(1,$test,PDO::PARAM_STR); if($it->execute()) echo $test; else echo 'Error!';

So the question is:

  1. Why does the problem occur even when I have used utf8mb4 and utf8mb4_unicode_ci wherever possible?
    2.Could this be a php problem rather than being a MySQL problem since the problem occurs during INSERTION in database(I have used stripslashes(), trim() and htmlspecialchars() functions for the strings)?
    3.Interestingly the emojis(unicode characters) work finely on one page of the website that is the chat page. Could it be due to encryption? There seems no possible explanation!

Since I am on shared hosting, I cannot edit the MySQL configuration file.

E_net4
  • 27,810
  • 13
  • 101
  • 139
Vipul Sharma
  • 525
  • 9
  • 23
  • You should change your question text blob into a list. While it sounds like you took all the right steps (collation doesn't matter as much as charset/encoding), this isn't answerable without some hexdumps of input, hexdump of stored values, check with mysql CLI client etc, db insert code sample, and so on. Test with SQLite alternatively to verify if it's really the database, not your form handling. See also: [UTF-8 all the way through](//stackoverflow.com/q/279170) – mario Oct 14 '18 at 15:48
  • @mario i went through that question, seems i have done the steps already. – Vipul Sharma Oct 14 '18 at 16:56
  • "I have used stripslashes(), trim() and htmlspecialchars() functions" → none of which are in any way relevant nor beneficial to your issue. Is there a particular reason you won't go into details (code/data samples)? What random suggestion are you hoping for that hasn't been covered in "nearly all the solutions on SO"? – mario Oct 14 '18 at 17:26
  • VTC: Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and **the shortest code necessary to reproduce it** in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a [mcve]. – mario Oct 14 '18 at 17:27
  • edited @mario although there is nothing such special about the code, and yeah sometimes stripslashes and magic quotes come in the way! – Vipul Sharma Oct 14 '18 at 18:15
  • That doesn't look like you read the reference question at all. No charset specified on the form, no command charset, no SQL parameter binding. – mario Oct 14 '18 at 18:29
  • Yes i have tried that too, the code got a little lengthy in that case! Guess you are ok with that now! – Vipul Sharma Oct 14 '18 at 18:34
  • What does `echo bin2hex($text);` give you? – Rick James Oct 16 '18 at 01:12

3 Answers3

14

After wasting a lot of time with character variables and mysql configurations, I fixed the problem with a simple solution that is base64_encode.
Passing the input string through base64_encode and later using base64_decode to show them worked in this case. However the strings take up more space than earlier but this solution is really simple and worthy.
I just posted this question to know if someone ever faced something similar but people really do not get the main point of the question. That is where the people on SO get really exhausting. Anyways thanks for your help people! :)

Vipul Sharma
  • 525
  • 9
  • 23
  • Don't listen to these haters - this was exactly what I have been looking for. Azure does not let you configure their database at all. This is a great workaround! Upvoted :) – eric k atwood Mar 10 '20 at 00:46
  • If I understand correctly, database side you use the utf8mb4 as storage type. And you use base64encode and decode PHP side to write and read to and from the database? – Gilles Lesire May 03 '21 at 10:01
  • So database side you use the utf8mb4 as storage type. And you use base64encode and decode PHP side to write and read to and from the database. Works great here! – Gilles Lesire May 04 '21 at 11:01
  • Working like a charm. Thank you for the quick solution. – Satheez Jun 21 '21 at 17:08
1

There are multiple things that need to be declared utf8mb4 -- the column, the connection, the form, etc. Offhand, it looks like you did all of them.

See this for what causes "question mark": Trouble with UTF-8 characters; what I see is not what I stored

You talked about 2 character_set variables, but not the 3 that are needed:

| character_set_client     | utf8mb4                    |
| character_set_connection | utf8mb4                    |
| character_set_results    | utf8mb4                    |

See if your php.ini has default_charset UTF-8

Do not use any form of encode/decode, it just avoids fixing the problem.

Please show us echo bin2hex($text); for some emoji.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • the 3 variables you mentioned above are set to utf8mb4 too. Could you give the reason why passing it through base64 makes it work? – Vipul Sharma Oct 16 '18 at 06:32
  • Passing a through bin2hex gives f09f8d94 – Vipul Sharma Oct 16 '18 at 06:34
  • And i cannot view my php.ini – Vipul Sharma Oct 16 '18 at 06:35
  • @VipulSharma - base64 turns it into a string of English characters -- English does not get mangled by character set issues. `f09f8d94` looks correct for that Emoji. – Rick James Oct 16 '18 at 15:54
  • Strange. I don't see anything wrong at this point. Can you create a _minimal_ test case that demonstrates the problem? (Very short PHP script, `CREATE TABLE`, etc.) Either something will jump out during this exercise, or it will give us something to experiment with independently. – Rick James Oct 16 '18 at 15:56
  • I made a test script too! As was expected it too didn't show emojis. Do you need to see the script? – Vipul Sharma Oct 17 '18 at 07:14
  • @VipulSharma - Yes, please. I would like to test it and maybe tweak it. – Rick James Oct 17 '18 at 16:37
0

I've changed charset for PDO db connection to utf8mb4 and wchole database collation to utf8mb4_general_ci, and emoticons finally worked!

E_net4
  • 27,810
  • 13
  • 101
  • 139