0

I am working on some SQL related to emojis, using data from the emojione release on github (https://github.com/emojione/emojione/releases/tag/4.0.0).

The emojis are given a unicode value - e.g. "castle" = 1f3f0.

Lots of emojis have skin tone variations - for example - person climbing () = 1f9d7, but also has 5 skin tone variations:

  1. - 1f9d7-1f3fb
  2. - 1f9d7-1f3fc
  3. - 1f9d7-1f3fd
  4. - 1f9d7-1f3fe
  5. - 1f9d7-1f3ff

The skin-tone variation is denoted by the 1f3fb part of the unicode value, and can appear as above - after the initial identifier for that emoji, or can sometimes be sandwiched between two segments of the unicode value - for example, man-mechanic (‍) has a unicode value of 1f468-1f527, but then the skin tone unicode values are as follows:

  1. ‍ Man Mechanic: Light Skin Tone - 1f468-1f3fb-1f527
  2. ‍ Man Mechanic: Medium-light Skin Tone - 1f468-1f3fc-1f527
  3. ‍ Man Mechanic: Medium Skin Tone - 1f468-1f3fd-1f527
  4. ‍ Man Mechanic: Medium-dark Skin Ton- 1f468-1f3fe-1f527
  5. ‍ Man Mechanic: Dark Skin Tone - 1f468-1f3ff-1f527

I'm trying to write some SQL to identify those emojis who have skin tone variations.

This is some sample data:

select my_data.*
from (select '1f468-1f9b3' unicode_value union all
      select '1f468-1f3fb-1f9b3'         union all
      select '1f468-1f3fc-1f9b3'         union all
      select '1f468-1f3fd-1f9b3'         union all
      select '1f468-1f3fe-1f9b3'         union all
      select '1f468-1f3ff-1f9b3'         union all
      select '1f474'                     union all
      select '1f474-1f3fb'               union all
      select '1f474-1f3fc'               union all
      select '1f474-1f3fd'               union all
      select '1f474-1f3fe'               union all
      select '1f474-1f3ff'               union all
      select '1f98d'                     union all
      select '1f40a'                     union all
      select '1f1fc-1f1f8') my_data;

Given the above sample data, is there any way to inly return these:

  1. 1f468-1f9b3
  2. 1f474

Since they are the only records with the 5 skin tone variants which appear immediately after the first segment of the unicode value?

It might not be possible to achieve this without writing functions or something complicated, or combining the SQL with a scripting language like PHP, but I wanted to ask here.

4532066
  • 2,042
  • 5
  • 21
  • 48

1 Answers1

0
SELECT UNHEX('F09F91A8E2808DF09F94A7');

F09F91A8 = man
E2808D = Zero Width Joiner
F09F94A7 = mechanic

Some references:
http://unicode.scarfboy.com/?s=1f3fd
https://codepoints.net/U+1f3fd
http://www.fileformat.info/info/unicode/char/1f3fd/index.htm

MySQL is friendlier to UTF-8 than to Unicode codepoints.

Rick James
  • 135,179
  • 13
  • 127
  • 222