-1

I'm trying to get a Python program to extract a date from a website and copy this data to a json file. here is the part of the code which concerns the creation of the json file :

with open(city+'.json', 'w', encoding='utf8') as file:
        json.dump(data, file ,ensure_ascii=False)

with open('citys.txt', "r") as file:
        citys = file.readlines()

When i try to execute the code I get the following error:

Traceback (most recent call last):
  File "main3.py", line 173, in <module>
get_data_from_company(company_link['href'],city_name_id[0])
  File "main3.py", line 132, in get_data_from_company
data = json.load(file)
  File "C:\Python38\lib\json\__init__.py", line 293, in load
return loads(fp.read(),
  File "C:\Python38\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
  UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 7738: character maps to <undefined>
  • Please provide the json file, if it is too long, you can just provide the part which got error. – leaf_yakitori Aug 04 '21 at 09:34
  • due to the error no file is generated – Med_Maliari Aug 04 '21 at 09:45
  • I mean the origin `{city}.json` file before you run the code. – leaf_yakitori Aug 04 '21 at 09:46
  • Your traceback indicates a problem in a function which is _reading_ data from a file. The code you posted is fine, and cannot produce the error you are asking about. Please [edit] to show the _actual_ code which generates the traceback, and ideally also a minimized version of the JSON file so we can see what's there. We need to see the _actual bytes_ which represent any non-ASCII characters. See also the guidance for providing a [mre] and https://meta.stackoverflow.com/questions/379403/problematic-questions-about-decoding-errors – tripleee Aug 04 '21 at 09:55
  • no the file will be created after executing the code – Med_Maliari Aug 04 '21 at 10:26
  • Please [don’t post images of code, error messages, or other textual data.](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors) – tripleee Aug 04 '21 at 10:41
  • Again, you are writing the file just fine. The _reading_ code is flawed. – tripleee Aug 04 '21 at 10:42
  • the code allows me to read and extract lines from a text file see the modification – Med_Maliari Aug 04 '21 at 10:48
  • The code in the question still does not match the traceback. Again, you want to provide a [mre] – tripleee Aug 04 '21 at 10:57
  • Duplicate of https://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-codec-cant-decode-byte-x-in-position-y-character – tripleee Aug 04 '21 at 10:57

1 Answers1

2

You failed to specify the encoding in the second open call. This causes Python to use your system's default encoding, which on Windows is usually some legacy code page (in your case, code page 1252) which cannot properly handle UTF-8.

The fix is easy; specify encoding='utf-8' just like in the first open call.

tripleee
  • 175,061
  • 34
  • 275
  • 318