1

I have a keys list including words. When I make this command:

for key in keys:
  print(key)

I get normal output in terminal.

enter image description here

but when I print the entire list using print(keys), I get this output:

enter image description here

I have tried using key.replace("\u202c", ''), key.replace("\\u202c", ''), re.sub(u'\u202c', '', key) but none solved the problem. I also tried the solutions here, but none of them worked either:

Replacing a unicode character in a string in Python 3

Removing unicode \u2026 like characters in a string in python2.7

Python removing extra special unicode characters

How can I remove non-ASCII characters but leave periods and spaces using Python?

I scraped this from Google Trends using Beautiful Soup and retrieved text from get_text() Also in the page source of Google Trends Page, the words are listed as follows:

enter image description here When I pasted the text here directly from the page source, the text pasted without these unusual symbols.‬‬

  • @OferSadan I just tried this, getting the same output as in the question. – Himanshu Ahuja Jul 12 '17 at 00:21
  • Do a `sub(r'\p{Block=General_Punctuation}+','')` on each item in the list after it is generated. Or, you can use the range `[\u2000-\u206F]+` which is the _block_. See https://www.compart.com/en/unicode/block/U+2000 –  Jul 12 '17 at 01:17
  • See also https://en.wikipedia.org/wiki/General_Punctuation –  Jul 12 '17 at 01:24

1 Answers1

2

You can just strip out the characters using strip.

>>> keys=['\u202cABCD', '\u202cXYZ\u202c']
>>> for key in keys:
...     print(key)
... 
ABCD
XYZ‬
>>> newkeys=[key.strip('\u202c') for key in keys]
>>> print(keys)
['\u202cABCD', '\u202cXYZ\u202c']
>>> print(newkeys)
['ABCD', 'XYZ']
>>> 

Tried 1 of your methods, it does work for me:

>>> keys
['\u202cABCD', '\u202cXYZ\u202c']
>>> newkeys=[]
>>> for key in keys:
...     newkeys += [key.replace('\u202c', '')]
... 
>>> newkeys
['ABCD', 'XYZ']
>>> 
riteshtch
  • 8,629
  • 4
  • 25
  • 38