1

I run this file test.py in my Sublime venv Python build system:

import re

text = "skull ☠️..."
print(text)
print(repr(text))

x = re.sub(r' *[\u2600-\u26FF]', r'', text)
print(x)
print(repr(x))

And see the output in Sublime window as expected:

skull ☠️...
'skull ☠️...'
skull️...
'skull️...'

But when I run the same file from command line in Windows 10 I get a strange question marks:

enter image description here

In Google Colab it also works as expected:

enter image description here

There is an invisible symbol with index 5:

enter image description here enter image description here

What's happening here? How can I remove ☠️ without any question marks or zero width symbols on its place?

dereks
  • 544
  • 1
  • 8
  • 25

2 Answers2

3

To identify the character that is left, you can paste it in some online Tool like this one.

The left character is U+FE0F : VARIATION SELECTOR-16 [VS16] {emoji variation selector}

and you can match or replace it by: \uFE0F

Together with your current pattern: [\u2600-\u26FF\uFE0F]

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • 1
    Thank you! It looks like it's better to remove the whole range of selectors `[\u2600-\u26FF\uFE00-\uFE0f]`. – dereks Dec 29 '20 at 21:25
0
  1. The Windows command prompt is a text user interface. So why do you want to output graphic symbols like emojis on a pure text interface at all? The font configured for drawing characters and symbols into a Windows console window must support the characters and symbols you want to see in the console window. So simply you have to add custom fonts to your cmd so it can support the drawing of this emoji , here's a link to help you on how to add custom fonts to your command prompt https://www.maketecheasier.com/add-custom-fonts-command-prompt-windows10/

  2. The Windows default console host (conhost.exe) does not support printing Unicode characters. However, the new Windows Terminal does. Run that code in the Windows Terminal (wt.exe), because it has fully Unicode support. As per this answer:does all windows command prompt not support emoji?

  3. This is a very lovely article about What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ will help you understand the encoding of every windows version. I hope I could help you

  • I don't want to print emoji in command line. I want to remove ☠️ completely. Without question marks on its place. For some reason `r' *[\u2600-\u26FF]'` removes only part of this emoji scull. – dereks Dec 29 '20 at 21:02
  • I apologize for the misunderstanding I think now we should be looking for the right range we will include in our regex – Dalia Elbanna Dec 29 '20 at 21:17
  • It is already in the right range ☠️ `\u2620`. – dereks Dec 29 '20 at 21:19