29

I have a string in Python like this:

u'\u200cHealth & Fitness'

How can i remove the

\u200c

part from the string ?

joanis
  • 10,635
  • 14
  • 30
  • 40
V.Anh
  • 497
  • 3
  • 8
  • 15

5 Answers5

51

You can encode it into ascii and ignore errors:

u'\u200cHealth & Fitness'.encode('ascii', 'ignore')

Output:

'Health & Fitness'
Arount
  • 9,853
  • 1
  • 30
  • 43
  • 8
    This obviously works in the above example but you are forcing the string into ascii losing all unicode chars, which obviously is not a solution that works for all – Martin Massera Jul 28 '19 at 14:05
32

If you have a string that contains Unicode character, like

s = "Airports Council International \u2013 North America"

then you can try:

newString = (s.encode('ascii', 'ignore')).decode("utf-8")

and the output will be:

Airports Council International North America

Upvote if helps :)

Hayat
  • 1,539
  • 4
  • 18
  • 32
22

I just use replace because I don't need it:

varstring.replace('\u200c', '')

Or in your case:

u'\u200cHealth & Fitness'.replace('\u200c', '')
joanis
  • 10,635
  • 14
  • 30
  • 40
  • 9
    This is actually better than the accepted answer in most strings. The \u200c is a zero width non joiner, which is an unusual whitespace-type character that `strip()` ignores. In most cases with unicode strs you do not want to `encode(ascii, ignore)`. – Chet Mar 28 '19 at 15:41
  • 2
    This is general solution since ascii may remove some other Unicode characters as well. – prosti Dec 03 '19 at 14:31
  • appreciate this! – user3768258 Aug 26 '23 at 02:13
4

for me the following worked

mystring.encode('ascii', 'ignore').decode('unicode_escape')
Diana
  • 935
  • 10
  • 31
  • 2
    You could improve your answer by explaining _why_ this code works, and what you're doing here. That way, others can be educated. – RyanZim Dec 11 '18 at 13:44
  • tbh, that was a 'Frankenstein' version of all answers that I had previously found but which didn't work. I can't really explain why this one worked over the rest of solutions in my case.. – Diana Oct 23 '19 at 11:19
2

In the specific case in the question: that the string is prefixed with a single u'\200c' character, the solution is as simple as taking a slice that does not include the first character.

original = u'\u200cHealth & Fitness'
fixed = original[1:]

If the leading character may or may not be present, str.lstrip may be used

original = u'\u200cHealth & Fitness'
fixed = original.lstrip(u'\u200c')

The same solutions will work in Python3. From Python 3.9, str.removeprefix is also available

original = u'\u200cHealth & Fitness'
fixed = original.removeprefix(u'\u200c')
snakecharmerb
  • 47,570
  • 11
  • 100
  • 153