Python script to convert from UTF-8 to ASCII

Question

I'm trying to write a script in python to convert utf-8 files into ASCII files:

#!/usr/bin/env python
# *-* coding: iso-8859-1 *-*

import sys
import os

filePath = "test.lrc"
fichier = open(filePath, "rb")
contentOfFile = fichier.read()
fichier.close()

fichierTemp = open("tempASCII", "w")
fichierTemp.write(contentOfFile.encode("ASCII", 'ignore'))
fichierTemp.close()

When I run this script I have the following error :

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 13: ordinal not in range(128)

I thought that can ignore error with the ignore parameter in the encode method. But it seems not.

I'm open to other ways to convert.

You got the error because the character doesn't exist in the ASCII character set, so it can't be converted. Sometimes you can map the UTF8 character to a closest visual-fit character in ASCII, such as `é` to `e`, but that can change the meaning of words. You have to decide if that path will work for your application. — the Tin Man, Nov 28 '10 at 23:24

score 69 · Accepted Answer · answered Nov 28 '10 at 23:13

69

data="UTF-8 DATA"
udata=data.decode("utf-8")
asciidata=udata.encode("ascii","ignore")

answered Nov 28 '10 at 23:13

Utku Zihnioglu

4,714
3
38
50

18

Sounds like a bad recipe for data loss. – tchrist Nov 28 '10 at 23:55
53

You should expect data loss if you wish to convert from a 8bit encoding to 7bit. – Utku Zihnioglu Nov 29 '10 at 00:01
3

I ignored that I have to decode first. It works now thanks. To answer to the questions, I want to do this because my MP3 player can only display lyrics files encoded in ASCII. – Nicolas Nov 29 '10 at 21:33
You can have a look at this solution: http://stackoverflow.com/a/517974/1463812 – JSBach Mar 27 '17 at 05:08
I get `AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?` for the second line with python 3.10.4 – peer Jun 08 '23 at 22:34

score 9 · Answer 2 · answered Nov 28 '10 at 23:23

9

import codecs

 ...

fichier = codecs.open(filePath, "r", encoding="utf-8")

 ...

fichierTemp = codecs.open("tempASCII", "w", encoding="ascii", errors="ignore")
fichierTemp.write(contentOfFile)

 ...

answered Nov 28 '10 at 23:23

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

score 6 · Answer 3 · answered Nov 28 '10 at 23:26

6

UTF-8 is a superset of ASCII. Either your UTF-8 file is ASCII, or it can't be converted without loss.

answered Nov 28 '10 at 23:26

Tobu

24,771
4
91
98

15

I think he's aware of that, otherwise he wouldn't be trying to use `'ignore'`. – Ignacio Vazquez-Abrams Nov 28 '10 at 23:28
1

@Ignacio True. But this one left me wondering what the asker is trying to achieve. They could be cargo-culting, or maybe their need is best met by something like urlencode, or being lossy is just acceptable. – Tobu Nov 28 '10 at 23:43
I am afraid of the cargo-culting. Culling all characters that you don’t have an appreciation for is really insensitive. – tchrist Nov 28 '10 at 23:58
@Ignacio: Imagine being addressed as *Vzquez-Abrams*. :( – tchrist Nov 29 '10 at 00:11
@tchrist: That's why I never use it. – Ignacio Vazquez-Abrams Nov 29 '10 at 00:13
Sometimes you can convert UTF8 to ASCII without losses, for instance, single quotes or apostrophes, in few other cases - arithmetic operations - both available as UTF8 long encoding and ASCII single symbol. – Kovalex Nov 29 '21 at 06:49

Python script to convert from UTF-8 to ASCII

3 Answers3

Linked