UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 100: character maps to

Question

I have two files both at the same directory:

http://nlp.lsi.upc.edu/awn/AWNDatabaseManagement.py.gz
the xml database of Arabic WordNet (http://nlp.lsi.upc.edu/awn/get_bd.php) upc_db.xml

When i try to run the .py file to give me the error in the image i am trying to check the .py file is working so i can import it as WordNet for arabic words

Can you help me through it?

Thanks

image for error

Please do not post errors or code or both as images. Include these in your question. See https://stackoverflow.com/help/minimal-reproducible-example. Still, perhaps you should use a different encoding such as UTF-8. — , Oct 27 '20 at 19:27
Does this answer your question? [UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to ](https://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-codec-cant-decode-byte-x-in-position-y-character) — Ulrich Eckhardt, Oct 27 '20 at 19:35
Also, please search for the error message online. This one is trivial to find! As a new user here, please also take the [tour] and read [ask]. BTW: Without a [mcve], your question is considered off-topic anyway. — Ulrich Eckhardt, Oct 27 '20 at 19:36

Muhammad Afzaal · Answer 1 · 2022-06-30T15:02:01.237

7

To read any binary file/db use the encoding="utf-8" while opening the file/db. UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte code units. So, simple is the best.

edited Jun 30 '22 at 15:02

answered Mar 15 '21 at 19:00

Muhammad Afzaal

308
4
10

3

`encoding="uft-8"` => `encoding="utf-8"` – matebende Feb 25 '22 at 06:22

score 5 · Answer 2 · edited Apr 04 '22 at 03:59

5

To read the above binary file, use

ent = open(ent, 'rb')

instead of,

ent = open(ent)

edited Apr 04 '22 at 03:59

emen

6,050
11
57
94

answered Oct 30 '20 at 11:24

Abdelrahman Yasser

85
1
1
7

score 0 · Answer 3 · answered Jul 28 '22 at 03:39

0

Try encoding it.

with open(file, encoding="utf-8") as file:
    # Reads each character
    file.read()

answered Jul 28 '22 at 03:39

jangles

303
1
6
22

UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 100: character maps to

3 Answers3

Linked