4

I have two files both at the same directory:

  1. http://nlp.lsi.upc.edu/awn/AWNDatabaseManagement.py.gz

  2. the xml database of Arabic WordNet (http://nlp.lsi.upc.edu/awn/get_bd.php) upc_db.xml

When i try to run the .py file to give me the error in the image i am trying to check the .py file is working so i can import it as WordNet for arabic words

Can you help me through it?

Thanks

image for error

emen
  • 6,050
  • 11
  • 57
  • 94
  • 2
    Please do not post errors or code or both as images. Include these in your question. See https://stackoverflow.com/help/minimal-reproducible-example. Still, perhaps you should use a different encoding such as UTF-8. –  Oct 27 '20 at 19:27
  • 3
    Does this answer your question? [UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to ](https://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-codec-cant-decode-byte-x-in-position-y-character) – Ulrich Eckhardt Oct 27 '20 at 19:35
  • 2
    Also, please search for the error message online. This one is trivial to find! As a new user here, please also take the [tour] and read [ask]. BTW: Without a [mcve], your question is considered off-topic anyway. – Ulrich Eckhardt Oct 27 '20 at 19:36

3 Answers3

7

To read any binary file/db use the encoding="utf-8" while opening the file/db. UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte code units. So, simple is the best.

Muhammad Afzaal
  • 308
  • 4
  • 10
5

To read the above binary file, use

ent = open(ent, 'rb')

instead of,

ent = open(ent)
emen
  • 6,050
  • 11
  • 57
  • 94
0

Try encoding it.

with open(file, encoding="utf-8") as file:
    # Reads each character
    file.read()
jangles
  • 303
  • 1
  • 6
  • 22