Special Unicode Characters are not removed in Python 3

Question

I have a keys list including words. When I make this command:

for key in keys:
  print(key)

I get normal output in terminal.

but when I print the entire list using print(keys), I get this output:

I have tried using key.replace("\u202c", ''), key.replace("\\u202c", ''), re.sub(u'\u202c', '', key) but none solved the problem. I also tried the solutions here, but none of them worked either:

Replacing a unicode character in a string in Python 3

Removing unicode \u2026 like characters in a string in python2.7

Python removing extra special unicode characters

How can I remove non-ASCII characters but leave periods and spaces using Python?

I scraped this from Google Trends using Beautiful Soup and retrieved text from get_text() Also in the page source of Google Trends Page, the words are listed as follows:

When I pasted the text here directly from the page source, the text pasted without these unusual symbols.‬‬

@OferSadan I just tried this, getting the same output as in the question. — Himanshu Ahuja, Jul 12 '17 at 00:21
Do a `sub(r'\p{Block=General_Punctuation}+','')` on each item in the list after it is generated. Or, you can use the range `[\u2000-\u206F]+` which is the _block_. See https://www.compart.com/en/unicode/block/U+2000 — , Jul 12 '17 at 01:17

riteshtch · Accepted Answer · 2017-07-12T00:28:52.070

2

You can just strip out the characters using strip.

>>> keys=['\u202cABCD', '\u202cXYZ\u202c']
>>> for key in keys:
...     print(key)
... 
ABCD
XYZ‬
>>> newkeys=[key.strip('\u202c') for key in keys]
>>> print(keys)
['\u202cABCD', '\u202cXYZ\u202c']
>>> print(newkeys)
['ABCD', 'XYZ']
>>>

Tried 1 of your methods, it does work for me:

>>> keys
['\u202cABCD', '\u202cXYZ\u202c']
>>> newkeys=[]
>>> for key in keys:
...     newkeys += [key.replace('\u202c', '')]
... 
>>> newkeys
['ABCD', 'XYZ']
>>>

edited Jul 12 '17 at 00:28

answered Jul 12 '17 at 00:25

riteshtch

8,629
4
25
38

This worked out for me! Any insights on why the methods I tried earlier didn't work? – Himanshu Ahuja Jul 12 '17 at 00:28
@HimanshuAhuja I tried 1 of your methods, it does work for me in python3 – riteshtch Jul 12 '17 at 00:32

Special Unicode Characters are not removed in Python 3

1 Answers1