Replace emoji with appropriate java code

Question

I am working on a simple java program that can take a string like this:

⛔️✋STOP✋⛔️ You've violated the law! But now... You

and replace each emoji with the appropriate java character. (I'm not sure what to call them).

Here is an example:

The automobile emoji: would be replaced with: "\uD83D\uDE97".

This allows me to have a string such as

"I am a car: \uD83D\uDE97"

in Java source code, and let it look like this:

So the question is, how can I automatically find a certain emojji in a string (for example, find every red car emoji in a string) and replace it with its appropriate "Java character"?

EDIT ONE:

Nevermind, turned out to be really simple. I could just do

string.replace("","Java code");

Consider writing your own answer / marking someone elses answer as correct since problem is solved. — Sanchit, Apr 22 '16 at 20:17

Ioan · Answer 1 · 2016-04-22T20:33:22.287

1

You should use the following method for this purpose:

public String replaceAll(String regex, String replacement)

See documentation.

Example:

source
.replaceAll("", "value")
.replaceAll("", "nextValue")

A nicer way to do it is to build a map with your existing chars, and do the replacement in a for each:

Map<String, String> mappedChars = new HashMap<>();
mappedChars.put("A", "valueForA");
mappedChars.put("B", "valueForB");

AtomicReference<String> value = new AtomicReference<>("A and B and C");

mappedChars
        .entrySet()
        .stream()
        .forEach(entry -> value.getAndUpdate(current -> current.replaceAll(entry.getKey(), entry.getValue())));

//valueForA and valueForB and C

edited Apr 22 '16 at 20:33

answered Apr 22 '16 at 20:14

Ioan

5,152
3
31
50

The way this is written, it doesn't really need to be a map. (You're not using the fast lookup at all.) But I think a map is probably a suboptimal structure for this task anyway. It might be OK if the number of code points per emoji were always 1, but in the general case, you're probably going to want something like an Aho-Corasick search over code points, because some emoji are composed of multiple code points. – Hakanai Mar 16 '18 at 02:38

score 0 · Answer 2 · answered Apr 22 '16 at 20:20

0

Be aware that not all encodings can handle these strings. You could pass the string to a byte array, parse it and convert it back to a string.

answered Apr 22 '16 at 20:20

gi097

7,313
3
27
49

score 0 · Answer 3 · edited May 23 '17 at 12:07

Those are unicode characters. Finding them and replacing them are actually different tasks.

Finding the unicode characters in a string is a difficult prospect. One method would be to simply apply indexOf to your string to find the beginning of a unicode character.

The following is a rather inefficient example, meant more to illustrate the point than to run optimally.

int unicodeCharacterLocation = Str.indexOf("\\u")

if (Character.isDigit(Str.charAt(unicodeCharacterLocation + 1)))
{
    if (Character.isDigit(Str.charAt(unicodeCharacterLocation + 2)))
    {
        if (Character.isDigit(Str.charAt(unicodeCharacterLocation + 3)))
        {
            if (Character.isDigit(Str.charAt(unicodeCharacterLocation + 4)))
            {
                //You may have found a unicode character
            }
        }
    }
}

This could be set in a loop to find all of the unicode characters.

Now, to display the character you found, you will want to replace the corresponding unicode value in your string with a single unicode character. To do this, you will have to interpret the code as its character. This question already has a good explanation of that process.

Replace emoji with appropriate java code

3 Answers3

Linked