1

I use the following regular expression to remove emojis:

"[\ud83c\udc00-\ud83c\udfff]|[\ud83d\udc00-\ud83d\udfff]|[\u2600-\u27ff]"

it works well in Java. But I got compile errors in c++:

\ud83c is not a valid Unicode character
\udc00 is not a valid Unicode character
\ud83c is not a valid Unicode character
\udfff is not a valid Unicode character
\ud83d is not a valid Unicode character
\udc00 is not a valid Unicode character
\ud83d is not a valid Unicode character
\udfff is not a valid Unicode character

How do I use the regular expression in c++?

Jarod42
  • 203,559
  • 14
  • 181
  • 302
lsz
  • 13
  • 2
  • 1
    Show your code. – Shawn Nov 07 '19 at 02:58
  • And aren't most of those codepoints part of utf-16 surrogate pairs? They're not valid by themselves, so the warnings make sense if you're using a utf-8 encoding. – Shawn Nov 07 '19 at 03:03
  • 3
    In Java, `\u` defines a code unit, but in C++, `\u` defines a code point. – Raymond Chen Nov 07 '19 at 03:42
  • Raw strings are also the way to go for regex in C++ – sweenish Nov 07 '19 at 03:45
  • That only matches a few emojis. Better regex: https://stackoverflow.com/a/58718505/46395 – daxim Nov 07 '19 at 18:06
  • Also, Java strings are Unicode. In C++, to use `\u` characters, make sure you are using a Unicode `wchar_t`/`char16_t`/`char32_t` string, not an ANSI `char` string (unless you are using UTF-8 and a regex parser that supports UTF-8) – Remy Lebeau Nov 07 '19 at 20:06

0 Answers0