I have a simple script that I'm attempting to use automate some of the japanese translation I do for my job.
import requests
import sys
import json
base_url = 'https://www.googleapis.com/language/translate/v2?key=CANT_SHARE_THAT&source=ja&target=en&q='
print(sys.argv[1])
base_url += sys.argv[1]
request = requests.get( base_url )
if request.status_code != 200:
print("Error on request")
print( json.loads(request.text)['data']['translations'][0]['translatedText'])
When the first argument is a string like 初期設定クリア this script will explode at line
print(sys.argv[1])
With the message:
line 5, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in
position 0-6: character maps to <undefined>
So the bug can be reduced too
import sys
print(sys.argv[1])
Which seems like an encoding problem. I'm using Python 3.5.1, and the terminal is MINGW64 under Windows7 x64.
When I write the same program in Rust1.8 (and the executable is ran under same conditions, i.e.: MINGW64 under Windows7 x64)
use std::env;
fn main() {
let args: Vec<String> = env::args().skip(1).collect();
print!("First arg: {}", &args[0] );
}
It produces the proper output:
$ rustc unicode_example.rs
$ ./unicode_example.exe 初期設定クリア
First arg: 初期設定クリア
So I'm trying to understand what is happening here. MINGW64 claims to have proper UTF-8 support, which it appears too. Does Python3.5.1 not have full UTF-8 support? I was under the assumption the move to Python3.X was because of Unicode support.