5

Yes, I know that this is a duplicate of a few existing questions. I have already fully acknowledged this with references in my original question introduction.

However, in 2019, the list is expanded to include additional set of 230 emojis, which I presume aren't covered in existing answers / ranges.

http://www.unicode.org/Public/emoji/12.0/

I am not exactly familiar nor comfortable constructing unicode ranges for removal, so if anyone knows can you post an updated version?

As per my understanding, the codes are a bit scattered around and cannot be easily defined from start to end in one single continuous range.


The best answer so far updated in 2018 here:

https://stackoverflow.com/a/41543705/2715309

text.replace(/([\uE000-\uF8FF]|\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDFFF]|[\u2011-\u26FF]|\uD83E[\uDD10-\uDDFF])/g, '');

As you can see, I also posted a comment about updated list, but the author hasn't updated answer yet.


additional similar questions:

jquery remove all emoji include new

Remove Emoji's from jquery

PHP : writing a simple removeEmoji function

dev101
  • 1,359
  • 2
  • 18
  • 32
  • Maybe it's better two write a regex to keep the characters you want to keep. – GolezTrol Mar 02 '19 at 01:44
  • I wish to keep all but emojis. – dev101 Mar 02 '19 at 01:45
  • 1
    I'd be less concerned about the unicode ranges which are relatively easy, and more concerned with enforcing consistent character encodings so that your strings in JavaScript actually contain what you expect and don't have broken surrogate pairs. – Patrick Roberts Mar 02 '19 at 01:49
  • Hi Patrick, can you elaborate a little? Any links? UTF-8 is used, so not sure what do you mean. Thanks – dev101 Mar 02 '19 at 01:50
  • 2
    Possible duplicate of [How to detect emoji using javascript](https://stackoverflow.com/questions/18862256/how-to-detect-emoji-using-javascript) – GolezTrol Mar 02 '19 at 01:50
  • 1
    One of the answers to the dupe question suggests [a library](https://github.com/mathiasbynens/emoji-regex) that can update itself with the modified unicode standards.. Also, ~if~ when new ranges are added, you can just update your copy of the library, instead of having to ask here again. ;) – GolezTrol Mar 02 '19 at 01:51
  • 2
    GolesTrol, obviously I already posted this in my original question with best questions/answers already linked. Please, read before downvoting a question more carefully. Thank you – dev101 Mar 02 '19 at 01:52
  • 1
    Do you know which 230 new emojis codes are? You could just add them, prepending with or operator (`|`) to regex here: `)/g, '')`. `[\u2011-\u26FF]` means range from `\u2011` to `\u26FF` for example. – Andre Figueiredo Mar 02 '19 at 01:53
  • GolesTrol Thanks for the library. – dev101 Mar 02 '19 at 01:53
  • Andre, If I new that, I wouldn't be asking this question, problem is I am not sure about codes of new symbols, every news and blog outlet out there just listed images. – dev101 Mar 02 '19 at 01:54
  • 2
    @dev101 You did not refer to the question I linked anywhere in your question. The question I linked to has a working answer, which I emphasized with the comment. Please read the comment before complaining about it. – GolezTrol Mar 02 '19 at 01:55
  • @dev101 You could work on making yourself clearer then. We're trying to help you know. – Andre Figueiredo Mar 02 '19 at 01:56
  • @GolesTrol Thanks, it's a link to external library, not exactly answer I expected. Thanked already. – dev101 Mar 02 '19 at 01:58
  • Forgot to post this: http://www.unicode.org/Public/emoji/12.0/ – dev101 Mar 02 '19 at 01:59
  • @AndreFigueiredo Thanks, sorry I forgot to post a link to unicode version 12.0 complete list when asking the question (had it already prepared). See the complete official documentation, maybe it will help you. – dev101 Mar 02 '19 at 02:01
  • @GolesTrol If you post an example applicable to JavaScript AND how to actually use that library, I will accept it, instead of just external resource link. Thank you – dev101 Mar 02 '19 at 02:04
  • @GolesTrol I have tested mentioned library today, unfortunately, it is obsolete at the moment (based on v11.0 Emoji version, not v12.0) and it actually passes few more emojis than the answer linked in my original question. I will have to manually update dependency and regex data. – dev101 Mar 02 '19 at 19:35
  • @dev101 with a bit of creativity one can get https://www.unicode.org/Public/emoji//*.txt then parse files, because they're in a well-defined structure, then doing `/(||<...>)/`. If you having issues parsing, think about breaking each file in lines, the cutting out everything after "#" or ";" – Andre Figueiredo Mar 02 '19 at 21:40
  • @AndreFigueiredo Thanks, I already know the solution now, no need for that creativity part because code already exists, just giving a chance to GolesTrol to post an answer, instead of taking points and credit for it myself. On a side note, it's a bit shame this question was downvoted so fast it was at one point on the edge to be closed, without proper reason imho *sigh*. – dev101 Mar 03 '19 at 04:45
  • @dev101 good to know. just for fun, I made one that builds a non-optimistic regex from latest available from the link: https://repl.it/@AndreFigueiredo/DeliriousMealyDeeplearning – Andre Figueiredo Mar 03 '19 at 17:42
  • After latest version testing from suggested regex library above, apparently it still works worse than solution posted in my question introduction, failing to match certain characters. There are also other issues reported with that library @ GitHub. – dev101 Mar 05 '19 at 20:48

0 Answers0