I want my python script to be able to decode utf-8 encoded strings. The script works fine when called from the console and in the python interpreter. However, when I call the script as a node.js child process, I get the following error:
Traceback (most recent call last):
File "test.py", line 1, in <module>
print(b'\xe1\x83\x90'.decode("utf-8"))
File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u10d0' in position 0: character maps to <undefined>
My node.js file (index.js) looks like this:
const spawn = require("child_process").spawn;
const python = spawn("python", ["test.py"]);
python.stdout.pipe(process.stdout);
python.stderr.pipe(process.stdout)
My python file (test.py) looks like this:
print(b'\xe1\x83\x90'.decode("utf-8"))
Edit:
Working code:
import sys
sys.stdout.reconfigure(encoding='utf-8')
print(b'\xe1\x83\x90'.decode())
We can use print(sys.stdout.encoding)
to check the encoding of sys.stdout.
For some reason, encoding of stdout was changed to "cp1252" (I assume by node.js?), and this line of code changes it back to "utf-8". Printing a utf-8 string when stdout was encoded with cp1252 was the problem.