2

This is what I am seeing:

Traceback (most recent call last):
  File "/home/user/tools/executeJobs.py", line 86, in <module>
    owner = re.sub('^(AS[0-9]+ )', '', str(element[2]))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 13: ordinal not in range(128)

In the error line you already see the line in question. str(array[0]) never failed me before. How to work around this? A quick and dirty solution is fine.

Update:

Element[2] comes from this binary .dat list: http://github.com/maxmind/geoip-api-php/blob/master/tests/data/… also avail here: http://dev.maxmind.com/geoip/legacy/geolite (The IP/ASN one at the bottom of the table)

Stephan Kristyn
  • 15,015
  • 14
  • 88
  • 147

2 Answers2

1

\xe7 appears to be the circumflex c ç in latin1 encoding

so assuming you have a unicode string u"\xe7".encode("latin1") should give you the bytestring "\xe7", you could also choose to encode it as "utf8" u"\xe7".encode("utf8") would give you the bytestring "\xc3\xa7" ... that may or may not fix your issues however. but it will definately give you a different error

for a quick and dirty solution

try:
    owner = re.sub('^(AS[0-9]+ )', '', element[2])
except TypeError as e:
    print "Weird:",element
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • Could I do `except ValueError as e: owner = element.encode("utf-8")` ? Should it say `e` instead of `element` there? – Stephan Kristyn Jan 30 '15 at 00:29
  • Tests failed with: `File "/home/user/tools/repository/nvla/executeJobs.py", line 87, in owner = re.sub('^(AS[0-9]+ )', '', element[2]) File "/usr/lib/python2.7/re.py", line 151, in sub return _compile(pattern, flags).sub(repl, string, count) TypeError: expected string or buffer` – Stephan Kristyn Jan 30 '15 at 00:58
  • oops ... do `except TypeError` instead of valueerror ... changing it in the answer now – Joran Beasley Jan 30 '15 at 00:58
-3

I've always used

s.replace(u'\xa0',' ')

In your case, it should look something like

s.replace(u'\xe7','whatever')
exhoosier10
  • 121
  • 4
  • 8
  • He did say quick and dirty. @Sir Ben Benji, I'd imagine there is a better, more all-encompassing solution out there that might be worth waiting for if you want to handle more than just this one character – exhoosier10 Jan 30 '15 at 00:20
  • 2
    Yes, but I have to say there is lots of input and there might be other `\xcodes` as well. – Stephan Kristyn Jan 30 '15 at 00:23