I am working on some data received from google big query which contains some special emoji in the data. I have a code that removes the emoji but it is not working for below specific emoji.
sample code that removes all emoji but not for the below case.
Using version
Python 3.9
from re import UNICODE, compile
emoji_pattern = compile("["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
u"\U0001F1F2-\U0001F1F4" # Macau flag
u"\U0001F1E6-\U0001F1FF" # flags
u"\U0001F600-\U0001F64F"
u"\U00002702-\U000027B0"
u"\U000024C2-\U0001F251"
u"\U0001f926-\U0001f937"
u"\U0001F1F2"
u"\U0001F1F4"
u"\U0001F620"
u"\u200d"
u"\u2640-\u2642"
"]+", flags=UNICODE)
# Works for this one
data = 'support.google.co.uk/s/.'
result = emoji_pattern.subn(r'', data)
# result --> ('support.google.co.uk/s/.', 1)
# Doesn't work in this case
data = 'www.google.co.uk/?'
result = emoji_pattern.subn(r'', data)
# result --> ('www.google.co.uk/?', 0)
Can someone help me with this case. Also it would be much helpful if someone can help me how to check the Unicode representation for (any special character or emoji) in python 3.9 so that I can update such unicode in the emoji pattern.