1
Traceback (most recent call last):
File "AutomationTool.py", line 2, in <module>
import MultiProcessController, RedisUtil, ADUtils, json, time
File "/var/www/html/ARB-Automation/MultiProcessController.py", line 2, in <module>
import AdTitleExtraction, ADUtils, AdwordsClient, RedisUtil, FinalURLRetrieval, ClusterStrategy, \
File "/var/www/html/ARB-Automation/AdTitleExtraction.py", line 2, in <module>
import Config, ADUtils, re, wordsegment as WS, queue, threading, time
File "/var/www/html/ARB-Automation/ADUtils.py", line 3, in <module>
import pymssql, pymysql, wordsegment as WS, gc
File "/usr/local/lib/python3.4/site-packages/wordsegment.py", line 49, in <module>
bigram_counts = parse_file(join(basepath, 'bigrams.txt'))
File "/usr/local/lib/python3.4/site-packages/wordsegment.py", line 45, in parse_file
return dict((word, float(number)) for word, number in lines)
File "/usr/local/lib/python3.4/site-packages/wordsegment.py", line 45, in <genexpr>
return dict((word, float(number)) for word, number in lines)
File "/usr/local/lib/python3.4/site-packages/wordsegment.py", line 44, in <genexpr>
lines = (line.split('\t') for line in fptr)
File "/usr/local/lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]


UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1286: ordinal not in range(128)

I'm trying to use wordsegment in my python code and I use python3.4.4 and it used to work on my local machine. I deployed on production server and I get this error, I have no idea why this happened. Could someone help me out please?

Chintan Shah
  • 935
  • 2
  • 9
  • 28
  • Could the difference be not the environment (server / local machine) but the data? I mean could it be the data used as input on your server is not the same as the one you used for the tests on your machine? Can you add a print / log to get the faulty data, at least to rule out the environment and see if you can reproduce locally? – Jérôme Mar 15 '16 at 12:56
  • Could be related to http://stackoverflow.com/questions/24475393/unicodedecodeerror-ascii-codec-cant-decode-byte-0xc3-in-position-23-ordinal – Jérôme Mar 15 '16 at 12:57
  • 1
    Your locale is set to `C`, definitely. – Antti Haapala -- Слава Україні Mar 15 '16 at 12:59
  • @Jérôme Hi, I checked the data too, its exactly identical. What you said was my first suspicion too. – Chintan Shah Mar 15 '16 at 14:23

1 Answers1

2

Python 3 tries to deduce the proper input/output character set from locale settings, if all else fails. Presumably the locale on your production server is not UTF-8 locale, and that is why Python gets it wrong.

You can force a certain encoding by using the PYTHONIOENCODING environment variable; for example

PYTHONIOENCODING=UTF-8 python myprogram.py

or by setting a proper UTF-8 locale such as C.UTF-8 or en_US.UTF-8.

  • Thanks. I have been trying to figure this out since so long now! Also the problem that just happened, where do I read more about it so I understand this better? – Chintan Shah Mar 15 '16 at 14:32