"charmap codec can't encode characters" error with cyrillic letters in Python3

Question

I am working on a personal project in which I have a webapp in which the frontend (React) passes some data to the backend NodeJS. For my particular use case I am required to use this data within a python script.

To achieve this I have the following function in my nodeJS server file. The function sets up an API endpoint which writes the data to a json file. My data contains cyrillic characters and so I am encoding the data as UTF-8 when i write to the JSON file. The python file is launched as a node child process and sets up some logging via stdout:

app.post('/add',interfaceAnki);
//Function which passes the JSON to anki itself through python
function interfaceAnki(req,res) {

  const launchPython = () => {
    console.log("launching python child process");
    var spawn = require("child_process").spawn;
    var py = spawn(path.join(__dirname, "../scripts/env/Scripts/python"),["-u",path.join(__dirname, "../scripts/anki_service.py")]);

    py.stdout.on('data', function(data) {
      console.log(data.toString());
    });

    py.stderr.on('data', function (data){
      console.log(data.toString());
    });
  }

  fs.writeFile(path.join(__dirname, "../scripts/datasource.json"), JSON.stringify(req.body),{encoding: 'utf8'},launchPython);  
  res.json({ msg: 'success' });  
}

A typical example of the contents of this JSON file would be something like this:

{"imageURL":"https://pixabay.com/get/g7fcf38edfbbf40f130e3aa88902ab0ab804f35e3caf48c33b513114889b49afbbdd7fdc7e4ea49f7dff708476fe73e21758ae8733a9db7b745dafccfe10048e0_640.jpg","accented":"соба'ка","pronounciationURL":"https://api.openrussian.org/read/ru/соба'ка","exampleSentence":"Мне нра́вятся соба́ки, а мое́й сестре́ кошки.","extraInfo":"feminine gender"}

I am having difficulties with encoding on the python side however. My python script reads in the data from the .JSON file to a dictionary.

def read_in():
    path = os.path.dirname(os.path.abspath(__file__))+'\datasource.json'
    with open(path,encoding="utf-8") as json_file:
        try:
            data = json.load(json_file)
            print(data["pronounciationURL"])
            return data
        except Exception as e:
            print("An error occured in reading the json datasource: ")
            print(e)
            sys.exit()

The line print(data["pronounciationURL"]) however throws the following error:

An error occured in reading the json datasource: 
'charmap' codec can't encode characters in position 36-39: character maps to <undefined>

I find that if I remove the print statements I can still use the data to write to a third party program's database without any issues however I am also trying to download a file from a url contained in the JSON file:

#type 'audio' | 'image'
def download_file(type,url,word):
    try:
        extension = ''
        if(type=='audio'):
            extension = '.mp3'
        elif(type=='image'):
            extension='.jpg'
        r = requests.get(url)
        print('passes the request')
        with open('./downloads/{word}{ext}'.format(word=word,ext=extension),encoding="utf-8") as file:
            file.write(r.content)
    except Exception as e:
        print("An error occured downloading {type}".format(type=type))
        print(e)    
        exit()

This function is called as:

download_file('audio',data["pronounciationURL"],data["accented"])

And I get the same error again when trying to call this function, more specifically from open():

An error occured downloading audio
'charmap' codec can't encode characters in position 50-53: character maps to <undefined>

Other people have asked about this error here however I have tried the accepted suggestion of adding encoding='utf-8' without success. In general I would like to understand why this error occurs in the first place and maybe what a better approach would be to working with cyrillic characters - possibly what my encoding should be from the source (nodeJS) to ensure cross compatibility between languages before even transferring to python. I don't seem to have any troubles working with the data in nodeJS, only python3.

Does the filepath that you are trying to open in `download_file`contain cyrillic or other non-latin characters? — snakecharmerb, May 16 '21 at 10:12
Yes, I'm basically trying to download an audio file of a russian word from a remote source. The remote source contains cyrillic in the pathname and I'm also saving the file locally using the word itself which is cyrillic. eg) ./downloads/соба'ка.mp3 — Blargian, May 16 '21 at 10:16
[This answer](https://stackoverflow.com/a/33859537/5320906) seems to suggest that changing your console to handle unicode may be the solution. See [this](https://stackoverflow.com/questions/388490/how-to-use-unicode-characters-in-windows-command-line) and [this](https://stackoverflow.com/questions/5419/python-unicode-and-the-windows-console) — snakecharmerb, May 16 '21 at 10:40

"charmap codec can't encode characters" error with cyrillic letters in Python3

0 Answers0