I have a $text = "Hello üäö$"
I wanted to remove just emoji's from the text using xquery. How can i do that?
expected result : "Hello üäö$"
i tried to use:
replace($text, '[^\x00-\xFFFF]', '')
but didn't work.
Thanks in advance :)
I have a $text = "Hello üäö$"
I wanted to remove just emoji's from the text using xquery. How can i do that?
expected result : "Hello üäö$"
i tried to use:
replace($text, '[^\x00-\xFFFF]', '')
but didn't work.
Thanks in advance :)
To replace emoji, you can make use of XPath's support for Character Class Escapes, specifically Category and Block Escapes, to match named Unicode blocks:
replace("Hello üäö$", "\p{IsEmoticons}", "")
This returns the expected result:
Hello üäö$
The "Emoticons" block doesn't contain all characters commonly associated with "emoji." For example, (Purple Heart, U+1F49C), according to a site like https://www.compart.com/en/unicode/U+1F49C that lets you look up Unicode character information, is from:
Miscellaneous Symbols and Pictographs, U+1F300 - U+1F5FF
This block is not available in XPath or XQuery processors, since it is neither listed in the XML Schema 1.0 spec linked above, nor is it in Unicode block names for use in XSD regular expressions—a list of blocks that XPath and XQuery processors conforming to XML Schema 1.1 are required to support.
For characters from blocks not available in XPath or XQuery, you can manually construct character classes. For example, given the purple heart character above, we can match it as follows:
replace("Purple heart", "[🌀-🗿]", "")
This returns the expected result:
Purple Heart
If you're wondering why we use 🌀
and not U+1F300
or \x1F300
, it is because, as Michael Kay noted above, "XQuery uses the XML escape convention 
, not the C/Java escape convention \xFFFF
."
(I've updated the answer in response to the other very helpful comments.)