I am working on a personal project in which I have a webapp in which the frontend (React) passes some data to the backend NodeJS. For my particular use case I am required to use this data within a python script.
To achieve this I have the following function in my nodeJS server file. The function sets up an API endpoint which writes the data to a json file. My data contains cyrillic characters and so I am encoding the data as UTF-8 when i write to the JSON file. The python file is launched as a node child process and sets up some logging via stdout:
app.post('/add',interfaceAnki);
//Function which passes the JSON to anki itself through python
function interfaceAnki(req,res) {
const launchPython = () => {
console.log("launching python child process");
var spawn = require("child_process").spawn;
var py = spawn(path.join(__dirname, "../scripts/env/Scripts/python"),["-u",path.join(__dirname, "../scripts/anki_service.py")]);
py.stdout.on('data', function(data) {
console.log(data.toString());
});
py.stderr.on('data', function (data){
console.log(data.toString());
});
}
fs.writeFile(path.join(__dirname, "../scripts/datasource.json"), JSON.stringify(req.body),{encoding: 'utf8'},launchPython);
res.json({ msg: 'success' });
}
A typical example of the contents of this JSON file would be something like this:
{"imageURL":"https://pixabay.com/get/g7fcf38edfbbf40f130e3aa88902ab0ab804f35e3caf48c33b513114889b49afbbdd7fdc7e4ea49f7dff708476fe73e21758ae8733a9db7b745dafccfe10048e0_640.jpg","accented":"соба'ка","pronounciationURL":"https://api.openrussian.org/read/ru/соба'ка","exampleSentence":"Мне нра́вятся соба́ки, а мое́й сестре́ кошки.","extraInfo":"feminine gender"}
I am having difficulties with encoding on the python side however. My python script reads in the data from the .JSON file to a dictionary.
def read_in():
path = os.path.dirname(os.path.abspath(__file__))+'\datasource.json'
with open(path,encoding="utf-8") as json_file:
try:
data = json.load(json_file)
print(data["pronounciationURL"])
return data
except Exception as e:
print("An error occured in reading the json datasource: ")
print(e)
sys.exit()
The line print(data["pronounciationURL"])
however throws the following error:
An error occured in reading the json datasource:
'charmap' codec can't encode characters in position 36-39: character maps to <undefined>
I find that if I remove the print statements I can still use the data to write to a third party program's database without any issues however I am also trying to download a file from a url contained in the JSON file:
#type 'audio' | 'image'
def download_file(type,url,word):
try:
extension = ''
if(type=='audio'):
extension = '.mp3'
elif(type=='image'):
extension='.jpg'
r = requests.get(url)
print('passes the request')
with open('./downloads/{word}{ext}'.format(word=word,ext=extension),encoding="utf-8") as file:
file.write(r.content)
except Exception as e:
print("An error occured downloading {type}".format(type=type))
print(e)
exit()
This function is called as:
download_file('audio',data["pronounciationURL"],data["accented"])
And I get the same error again when trying to call this function, more specifically from open()
:
An error occured downloading audio
'charmap' codec can't encode characters in position 50-53: character maps to <undefined>
Other people have asked about this error here however I have tried the accepted suggestion of adding encoding='utf-8'
without success. In general I would like to understand why this error occurs in the first place and maybe what a better approach would be to working with cyrillic characters - possibly what my encoding should be from the source (nodeJS) to ensure cross compatibility between languages before even transferring to python. I don't seem to have any troubles working with the data in nodeJS, only python3.