1

I am adding the emojis using their html codes (example: & # 1 2 8 0 7 0 ; ) using a javascript function in a contenteditable html element which displays the emoji (example: )

I then retrieve the content of the element with

var content = el.innerText;

I add the content in a json

var myjson = {"content" : content};

I send the json through ajax with content-type being

application/json

In, the backend i convert the json to a pojo using Gson, then persist it in the database using hibernate

In the servlet, using system.out.println, the content displays as ð in the db, a select query displays it as ð (with small numbers inside the rectangles)

I spent hours searching, setting the character set as utf8mb4 in the DB as well as the hibernate config file...example as described in the following links

Caused by: java.sql.SQLException: Unsupported character encoding 'utf8mb4'

How to store Emoji Character in MySQL Database

I'm a bit lost with it. My need is to persist a text containing emoji and be able to retrieve it(through hibernate or simple jdbc) and display it in a webpage.

What would be the best approach?

bkn
  • 21
  • 1
  • 11
  • I cannot post an answer, because - clearly - you have many parts working together. There are two primary points you have to keep in mind: **POINT 1:** An `Emoji` is **not a character** - it is just a sequence of characters from the **UNICODE** character set. An `Emoji` can be represented from as many as _5 16-bit characters_ lined up in a row. **POINT 2:** Not all editors, text-viewers, and User Interfaces are capable of _rendering an `Emoji`_ - even though the characters themselves are all right in the `String`. – Y2020-09 Oct 06 '20 at 13:37
  • I have tried retrieving the text from the DB and just printing it on the webpage, I can see that it displays �\u009f\u0091\u0086 for one emoji...any idea on how to display the emoji on a webpage ? – bkn Oct 06 '20 at 14:17

2 Answers2

1

Try this at database level:

Modify databases, tables, and columns Change the character set and collation properties of the databases, tables, and columns to use utf8mb4 instead of utf8.

For each database:

ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

For each table:

ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

For each column:

ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Check the maximum length of columns and index keys

show VARIABLES like 'ver%'

SHOW VARIABLES WHERE Variable_name LIKE 'character_set_%' OR Variable_name LIKE 'collation%';

====result should be==== character_set_client utf8mb4 character_set_connection utf8mb4 character_set_database utf8 character_set_filesystem binary character_set_results utf8mb4 character_set_server utf8 character_set_system utf8 collation_connection utf8mb4_general_ci collation_database utf8_general_ci collation_server utf8_general_ci

Repair and optimize all tables For each table

REPAIR TABLE table_name; OPTIMIZE TABLE table_name; $ mysqlcheck -u root -p –auto-repair –optimize –all-databases

Try this at database level. Your problem might be linked to a database.

  • I like this answer too. I gave an **upvote** - since you are facing the data-base side of the communication chain (which I don't do anymore - **SQL**). I just want to add, though, that I think the smartest way to deal with it is to save the `Emjoi` using the **text `String`** (`"👆"`) that is obtained via Java-Scipt using `"" + s.codePointAt(0) + ";";` because he intends to insert that same `Emoji` into another returned web-page. In that case **he would not need to** change any of the character-encodings at the data-base level... But there are multiple ways to deal with this... – Y2020-09 Oct 07 '20 at 13:49
0

Here, i got flustered because of the Java-Script which I have gotten away from in the past while... Play with this page, to see how Emoji's are represented.

CodePoint is not a change in representation, it is a change in interpretation... There is simply not enough space to explain eveything about escape sequences and character representation in a single Stack Oveflow answer... This is going to HiLite the major issues when retrieving an Emoji from an HTML Page.

<HTML>
<HEAD>
<TITLE>Emoji</TITLE>
<SCRIPT TYPE='text/javascript'>
function show()
{
    // Prints the EMOJI ITSELF... (I can't enter that here)
    var s = document.getElementById("thediv").innerHTML;
    alert(s);

    // Prints characters:  � � 
    //
    // NOTE: The UNICODE characters (55357 and 56390) are unprintable
    //       Therefore the question mark (�) is printed
    var s2 = " ";
    for (var i=0; i < s.length; i++) s2 = s2 + (s.charAt(i)) + ' ';
    alert(s2);
    
    // Prints Characters: 55357 56390 
    //
    // NOTE: This is guaranteed to be a number between 1 and 65,535
    //       which 2^16 - and UNICODE is a 16 bit space.
    var s3 = " ";
    for (var i=0; i < s.length; i++) s3 = s3 + (s.charCodeAt(i)) + ' ';
    alert(s3);
    
    // Prints Characters: 128070 56390 
    //
    // NOTE: here, the SECOND NUMBER is IRRELEVANT... It is "incorporated" in the first..
    //       As mentioned, interpreting two characters in a row using
    //       CODE-POINT means, the second number and the first are "Chomped" together.
    // The second number is called the "High Surrogate"
    var s4 = " ";
    for (var i=0; i < s.length; i++) s4 = s4 + (s.codePointAt(i)) + ' ';
    alert(s4);

    // Prints the String "&#128070;"
    //
    // YOU SHOULD PROBABLY BE SENDING THIS TO YOUR DATABASE...
    var s5 = "&#" + s.codePointAt(0) + ";";
    alert(s5);
}
</SCRIPT>
</HEAD>
<BODY>

<H1>Divider</H1>
<DIV ID="thediv">&#128070;</DIV>
<BR /><BR /><BUTTON onclick="show();">Click Me</BUTTON>
</BODY>
</HTML>

The above code will send the following java-script alerts:

document.getElementById("thediv").innerHTML; The rendered Emoji

s.charAt(i) The individual characters

s.charCodeAt(i) The Character Codes

s.codePointAt(i) The Code points

"&#" + s.codePointAt(0) + ";"; The HTML Escaped Emoji

Y2020-09
  • 1
  • 5