Chinese word, UnicodeEncodeError: 'ascii' codec can't encode characters in position 55-56: ordinal not in range(128)

Question

#python 3 version

   ...

#關於產地
...
crop = '牛蒡'
...

#要求輸入資料
def rundatainputcircle():
    marketinput = input('＊請擇一輸入： 1:台北一, 2:台北二, 3:三重市, 4:台中市, 5:高雄市, 6:鳳山市, 7:桃園縣 或是不填寫 > ')
    if marketinput == '':
        market = ''
    elif len(str(marketinput)) ==1 and 1 <= int(marketinput) <= 7:
        market = uriba[uribalist[int(marketinput)-1] + 1]
    else:
        print('請重新輸入：1:台北一, 2:台北二, 3:三重市, 4:台中市, 5:高雄市, 6:鳳山市, 7:桃園縣 或是不填寫 > ')
        rundatainputcircle()
rundatainputcircle()

#匯入資料
def rundatacircle():
    url = 'http://m.coa.gov.tw/OpenData/FarmTransData.aspx?' + '$top=' + top + '&$skip=0&crop=' + crop + '&StartDate=' + startdate + '&EndDate=' + enddate
    if market != '':
        url += '&Market=' + market
    else:
        url = url
    url = url.encode('ascii')
    print(url)#test
    urllib.request.urlretrieve(url, "data.gz")
    data_str = open('data.gz', 'r').read()#gzip.open('data.gz', 'r').read()
    gobou_data = json.loads(data_str)
    print(len(gobou_data))#test
    return gobou_data
rawdata = rundatacircle()

And it shows a mistake:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 55-56: ordinal not in range(128)

The part of mistake message:

------ UnicodeEncodeError Traceback (most recent call last) in () 92 print(len(gobou_data))#test 93 return gobou_data ---> 94 rawdata = rundatacircle() 95 96 #開始按照月份把資料載下來，從今天所屬的這個月，一直自動存到資料的最開頭101.01.01

in rundatacircle() 87 url = url 88 print(url)#test ---> 89 urllib.request.urlretrieve(url, "data.gz")#python 3 getting pics from url 90 data_str = open('data.gz', 'r').read()#gzip.open('data.gz', 'r').read() 91 gobou_data = json.loads(data_str)

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py in urlretrieve(url, filename, reporthook, data) 185 url_type, path = splittype(url) 186 --> 187 with contextlib.closing(urlopen(url, data)) as fp: 188 headers = fp.info() 189

....it is very long, so I don't list the whole message.

I had tried many methods on Google and stackflow, but couldn't solve this problem. And I cannot understand what the error meaning.

p.s. A problem at this part of code. And I use python 3.5

Crop is a Chinese word, and it shouldn't change or be deleted. If it had been deleted, the data will go wrong....

http://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec-cant-encode-character-u-xa0-in-position-20?rq=1 — Paul, Apr 05 '16 at 12:12
Sorry but it seems you did not read the results of your searches, see the answers in the duplicates... Ie.: http://stackoverflow.com/questions/3224268/python-unicode-encode-error — bufh, Apr 05 '16 at 12:16
@bufh ,thank you. yes , but the Chinese word will disappear.... If the Chinese word disappear, I can not get the data which I want — YannYann, Apr 05 '16 at 12:27
@Paul , Thank you. I had tried this before. But it will have problems when I use: `url = url.encode('utf-8')` and if I tried this: `url = url.encode('ascii', 'ignore').decode('ascii')` The Chinese word will disappear.... — YannYann, Apr 05 '16 at 12:29
@DavidLee no one has a plug-in recipe to solve your problem, and it is very hard to recommend anything without your complete code, data, error tracebacks, etc. You will either have to read about Unicode issues at the webpages linked above and solve the problem yourself, or speak to some experts in Chinese unicode issues and Python. — Paul, Apr 05 '16 at 14:39
@Paul, thank you , I see~~ and now I am trying the another method to solve this problem. If I found the solution, I would post it out. — YannYann, Apr 06 '16 at 02:14

score 1 · Answer 1 · edited May 23 '17 at 12:07

Finally, I found a method to solve this problem, and it went into two part.

First, I change the encoding of Chinese word in url:

url = 'http://m.coa.gov.tw/OpenData/FarmTransData.aspx?$top=' + top + 

'&$skip=0&' + urllib.parse.urlencode({'crop': crop}) + '&StartDate=' + startdate + '&EndDate=' + enddate
    if market != '':
        url += '&' + urllib.parse.urlencode({'Market': market})
    else:
        url = url

and then load the data form this list.

#     print(type(url))
#     print(dir(url))
    data = urllib.request.urlopen(url).read().decode('utf-8')#https://stackoverflow.com/questions/28906859/module-has-no-attribute-urlencode
#     print(type(data))
#     print(dir(data))
    result = json.loads(data)
#     result = json.loads(response.readall().decode('utf-8'))

watch out of whether the function have encode or decode. You can check this by print(dir(XXX))

(You can see this to understand: python 3 subprocess error in bytes)

score 0 · Answer 2 · edited May 08 '19 at 10:21

0

your problem is that the 'ascii' encoding you use to encode your URL doesn't understand the chinese characters.

After a short websearch I found the GB 18030 encoding which supports chinese characters. https://en.wikipedia.org/wiki/GB_18030

Try to use this to encode your URL.

edited May 08 '19 at 10:21

thepurpleowl

147
4
15

answered Apr 05 '16 at 13:08

Tim Stopfer

116
8

sorry, 'ascii' encoding is my trying I forget to delete this line... and `url = url.encode('GB18030')` also go wrong – YannYann Apr 05 '16 at 13:10
the url will be : b'http://m.coa.gov.tw/OpenData/FarmTransData.aspx?$top=700&$skip=0&crop=\xc5\xa3\xdd\xf2&StartDate=105.04.01&EndDate=105.04.05&Market=\xcc\xa8\xb1\xb1\xd2\xbb' – YannYann Apr 05 '16 at 13:29
I don't see the point of using exotic encodings from the past while utf-8 will do the job perfectly (and also happens to be the [encoding used by URIs](http://tools.ietf.org/html/rfc3986)). – spectras Oct 23 '16 at 14:06

Chinese word, UnicodeEncodeError: 'ascii' codec can't encode characters in position 55-56: ordinal not in range(128)

2 Answers2