how to replace non ascii char in python

Question

I need to replace non ASCII char like ¾ in Python but I get

SyntaxError: Non-ASCII character '\xc2' in file test.py but no encoding declared; see http://www.python.org/peps/pep-0263.html for details`

After following the directions on the webpage, I am getting

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 449: ordinal not in range(128)

Here's my code:

data = data.replace(u"½", u"1/2")
data = re.sub(u"¾", u"3/4", data, flags=re.DOTALL)

What do I need to change in my code?

my file is:

#!/usr/bin/python

with codecs.open("file.txt", "r", "utf8") as myfile:
    data = myfile.read()

data = data.replace(u"½", u"1/2")

file.txt is:

hello world ½

http://stackoverflow.com/questions/20078816/replace-non-ascii-characters-with-a-single-space — Tushar Gupta, Mar 22 '16 at 16:20
If it can remove it, you can use it to replace it. Try something ;) — Tushar Gupta, Mar 22 '16 at 16:23
I did try a lot, that is why I am asking, I am out of ideas now — Erik, Mar 22 '16 at 16:25
can you provide the the content (or a small part of it ) of `data`? — Billal Begueradj, Mar 22 '16 at 16:25
@BillBEGUERADJ the data varies between 100 b to 1k, mostly english with "¼½¾" char inside — Erik, Mar 22 '16 at 16:29
The error should be raised by the `open` call, not by the `replace` call. Show the complete stacktrace and the relevant code. — syntonym, Mar 22 '16 at 16:31
The encoding you specify needs to match the encoding that your editor uses when you save the file. What editor are you using? Are you on Windows? — John La Rooy, Mar 22 '16 at 16:31

wim · Accepted Answer · 2017-04-03T19:13:22.710

0

You're reading into the local variable data as bytes but then treating data it like it's already a unicode object.

Change this:

with open(file_name, "r") as myfile:
    data = myfile.read()

To this:

import io

with io.open(file_name, encoding="utf8") as myfile:
    data = myfile.read()

edited Apr 03 '17 at 19:13

answered Mar 22 '16 at 16:40

wim

Still get: SyntaxError: Non-ASCII character '\xc2' in file – Erik Mar 22 '16 at 16:52

score -1 · Answer 2 · edited May 23 '17 at 11:46

-1

It looks like you want to read it as unicode but pyhton reads it as a string. Try this, the question looks similar to your UnicodeDecodeError

https://stackoverflow.com/a/18649608/5504999

Try adding #coding: utf-8 on top of your file. This will allow the usage of Non-ASCII characters.

edited May 23 '17 at 11:46

Community

answered Mar 22 '16 at 16:30

Imtiaz Raqib

I get: UnicodeEncodeError: 'ascii' codec can't encode character u'\uf057' in position 383: ordinal not in range(128) – Erik Mar 22 '16 at 16:36
Did you try reading your first parameter in replace() with **u.decode('utf-8')**? – Imtiaz Raqib Mar 22 '16 at 16:39
Look at @wim's answer. – Imtiaz Raqib Mar 22 '16 at 16:50
I tried: with codecs.open(HTML_PATH + file_name, "r", "utf8") as myfile: data = myfile.read() data = data.replace(u"½", u"1/2") and i get: SyntaxError: Non-ASCII character '\xc2' in file – Erik Mar 22 '16 at 16:52
Try my answer, i.e., add `#coding: utf-8` on top of your file. It allows program to read non-ascii characters. – Imtiaz Raqib Mar 22 '16 at 16:58

score -1 · Answer 3 · answered Mar 22 '16 at 16:42

-1

I think your initial string is not properly encoded as unicode.

What you are attempting works just fine:

>>> st=u"¼½¾"
>>> print st.replace(u"½", u"1/2")
¼1/2¾

But the target needs to be unicode to start with.

answered Mar 22 '16 at 16:42

dawg

that's exactly what my code do: `data.replace(u"½", u"1/2")` but does not work – Erik Mar 22 '16 at 16:47
1

`data` is not a unicode string. That is why it is not working for you. Look at wim's answer. – dawg Mar 22 '16 at 16:47
I tried: with codecs.open(HTML_PATH + file_name, "r", "utf8") as myfile: data = myfile.read() data = data.replace(u"½", u"1/2") and i get: SyntaxError: Non-ASCII character '\xc2' in file – Erik Mar 22 '16 at 16:50

3 Answers3