3

I have searched for a solution online but this question is different, since I don't want to remove all non-ASCII chars, just a specific part of them.

I have a line that looks like that:

"[x+]4 gur Id lú gal sik-kát ⌈ x x ⌉ [……………]"

I want to remove only these chars:

'…' , '⌉' , '⌈'

The text is from here.

I tried to solve it using replace but whenever I write one of these non-ASCII chars I'm getting the following error line:

SyntaxError: Non-ASCII character '\xe2' in file C:/-------.py on line --, but no encoding declared;

Thanks in advance.

Yonlif
  • 1,770
  • 1
  • 13
  • 31

2 Answers2

1
'[x+]4 gur Id lú gal sik-kát ⌈ x x ⌉ [……………]'.encode().decode('ascii', errors='ignore')

out:

'[x+]4 gur Id l gal sik-kt  x x  []'

use encode to convert string to bytes, and decode it by ascii and igore the error.

I think you should use re.sub :

import re

text = "[x+]4 gur Id lú gal sik-kát ⌈ x x ⌉ [……………]"

re.sub('[…⌉⌈]', '', text)  # this will replace all the element in [] with ''

out:

'[x+]4 gur Id lú gal sik-kát  x x  []'
宏杰李
  • 11,820
  • 2
  • 28
  • 35
1

Use str.translate,

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import string

s = "[x+]4 gur Id lú gal sik-kát ⌈ x x ⌉ [……………]"
r = s.translate(None, '…⌉⌈')

print(r)
# [x+]4 gur Id lú gal sik-kát  x x  []
SparkAndShine
  • 17,001
  • 22
  • 90
  • 134
  • it's not working - SyntaxError: Non-ASCII character '\xe2' in file C:/----------.py on line 57, but no encoding declared; – Yonlif Feb 03 '17 at 10:47
  • @Yonlif, add `# -*- coding: utf-8 -*-` at the beginning of the file. Checked the updated answer. – SparkAndShine Feb 03 '17 at 11:00