python removing specific non-ASCII characters from a string

Question

I have searched for a solution online but this question is different, since I don't want to remove all non-ASCII chars, just a specific part of them.

I have a line that looks like that:

"[x+]4 gur Id lú gal sik-kát ⌈ x x ⌉ [……………]"

I want to remove only these chars:

'…' , '⌉' , '⌈'

The text is from here.

I tried to solve it using replace but whenever I write one of these non-ASCII chars I'm getting the following error line:

SyntaxError: Non-ASCII character '\xe2' in file C:/-------.py on line --, but no encoding declared;

Thanks in advance.

Remember that `ú` is non-ascii too. – Alastair McCormack Feb 01 '17 at 16:42 — Alastair McCormack, Feb 01 '17 at 16:42

宏杰李 · Accepted Answer · 2017-02-03T10:52:58.827

1

'[x+]4 gur Id lú gal sik-kát ⌈ x x ⌉ [……………]'.encode().decode('ascii', errors='ignore')

out:

'[x+]4 gur Id l gal sik-kt  x x  []'

use encode to convert string to bytes, and decode it by ascii and igore the error.

I think you should use re.sub :

import re

text = "[x+]4 gur Id lú gal sik-kát ⌈ x x ⌉ [……………]"

re.sub('[…⌉⌈]', '', text)  # this will replace all the element in [] with ''

out:

'[x+]4 gur Id lú gal sik-kát  x x  []'

edited Feb 03 '17 at 10:52

answered Feb 01 '17 at 16:44

宏杰李

11,820
2
28
35

but i want to keep the ú and á – Yonlif Feb 03 '17 at 10:49

SparkAndShine · Answer 2 · 2017-02-03T11:00:56.793

1

Use str.translate,

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import string

s = "[x+]4 gur Id lú gal sik-kát ⌈ x x ⌉ [……………]"
r = s.translate(None, '…⌉⌈')

print(r)
# [x+]4 gur Id lú gal sik-kát  x x  []

edited Feb 03 '17 at 11:00

answered Feb 01 '17 at 16:55

SparkAndShine

17,001
22
90
134

it's not working - SyntaxError: Non-ASCII character '\xe2' in file C:/----------.py on line 57, but no encoding declared; – Yonlif Feb 03 '17 at 10:47
@Yonlif, add `# -*- coding: utf-8 -*-` at the beginning of the file. Checked the updated answer. – SparkAndShine Feb 03 '17 at 11:00

python removing specific non-ASCII characters from a string

2 Answers2