0

How can I read/interpret the emoticons/Unicode characters in a string?

I am creating a CSV export of a data-grid, and would like to create a library of string representations of Twitter emoticons. I would like to replace the emoticon with its string representation.

This is an example of a string:

 Absa!!!! 

This is what the CSV version looks like:

😂😂😂 Absa!!!! 

I would like to render it something like this:

(FACE WITH TEARS OF JOY) (FACE WITH TEARS OF JOY) (FACE WITH TEARS OF JOY) Absa!!!! 

I got the details of the Unicode, Bytes (UTF-8) and emoticons from this site: http://apps.timwhitlock.info/emoji/tables/unicode

 = U+1F602    \xF0\x9F\x98\x82 FACE WITH TEARS OF JOY

I don't even know where to start! I assume a regex with a bunch of if statement? If an emoticon matches the regex, it gets replaced with its text version.

I have found a bunch of useful posts about removing emoticons, but none on replacing them. This is such an example:

/(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|[\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|[\ud83c[\ude32-\ude3a]|[\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])/g

There are a bunch of other useful answers in the same post: How to remove emoji code using javascript?

I would appreciate your feedback and input and suggestions!

Thank you!

Community
  • 1
  • 1
onmyway
  • 1,435
  • 3
  • 29
  • 53

1 Answers1

0

Here is my solution:

  1. I created a dictionary of all emoticons (in the example I have 2 dictionary items)
  2. I created a function that excepts a String and 2 other boolean parameters (see Angular service notes)

Outcome: The emoticons are converted to its string representation. This is great for Sentiment Analysis.

(function () {
    /**
    *
    * @param    {string}    string              String to convert
    * @param    {boolean}   [fill]              leaves the emoji in place, and inserts the description afterwards
    * @param    {boolean}   [omitSkinColour]    removes skin colour from both the emoji if fill is true, and the description
    * @returns  {string}                        Emoji-less string
    */
    'use strict';

    angular
        .module('portalDashboardApp')
        .factory('ReplaceEmojiService', ReplaceEmojiService);

    ReplaceEmojiService.$inject = [];

    function ReplaceEmojiService() {

        var service = {
            replaceEmoji: replaceEmoji
        };

        return service;

        function replaceEmoji(str, fill, omitSkinColour) {

            var dictionary = {
                "35": {
                    "8419": {
                        "name": "keycap: #"
                    },
                    "65039": {
                        "8419": {
                            "name": "keycap: #"
                        }
                    }
                },
                "42": {
                    "8419": {
                        "name": "keycap: *"
                    },
                    "65039": {
                        "8419": {
                            "name": "keycap: *"
                        }
                    }
                }  
            }

            fill = fill || false;
            omitSkinColour = omitSkinColour || false;

            if (omitSkinColour) {
                str = str.replace(/(?:\uD83C[\uDFFB-\uDFFF])/g, '');
            }

            var newStr = '';

            for (var i = 0; i < str.length;) {
                var localDict = dictionary;
                var j = i;
                var char = str.charCodeAt(j);

                while (localDict.hasOwnProperty(char)) {
                    localDict = localDict[char];
                    char = str.charCodeAt(++j);
                }

                if (localDict.name) {
                    if (fill) {
                        newStr += str.slice(i, j) + '(' + localDict.name + ')';
                    } else {
                        newStr += '(' + localDict.name + ')';
                    }
                } else {
                    newStr += str[i];
                }

                i += j - i || 1;
            }
           return newStr;
        }
    };

})();
onmyway
  • 1,435
  • 3
  • 29
  • 53