0

I have one more error to fix.

row = OpenThisLink + titleTag + JD
        try:
             csvwriter.writerow([row])
        except (UnicodeEncodeError, UnicodeDecodeError):
             pass

This gives the error (for this character: "ń")

row = OpenThisLink + str(titleTag) + JD
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 51: ordinal not in range(128)

I tried to fix this by using the method here. But,

>>> title = "hello Giliciński"
Unsupported characters in input
u = unicode(title, "latin1")

Traceback (most recent call last):
   File "<pyshell#56>", line 1, in <module>
     u = unicode(title, "latin1")
NameError: name 'title' is not defined
>>> title = "ń" Unsupported characters in input

According to documentation:

Unlike a similar case with UnicodeEncodeError, such a failure cannot be always avoided.

And indeed, my exception doesn't seem to work. Any suggestions?

Thanks!

Zeynel
  • 13,145
  • 31
  • 100
  • 145

2 Answers2

6

And indeed, my exception doesn't seem to work. Any suggestions?

row = OpenThisLink + titleTag + JD is outside the try/except block and so any exceptions raised while that statement is running will not be caught. This, however, will catch the exception:

try:
    row = OpenThisLink + titleTag + JD
    csvwriter.writerow([row])
except (UnicodeEncodeError, UnicodeDecodeError):
    print "Caught unicode error"

But, in the code that you posted, row = OpenThisLink + titleTag + JD will not raise UnicodeEncodeError if titleTag contains a unicode string; the result of the string concatenation will be of type unicode.

Now, the csv module doesn't support unicode, so when you call writerow() with unicode data this will raise UnicodeEncodeError. You need to encode your unicode strings into a suitable encoding (UTF8 would be best) and then pass that to writerow(), for example:

>>> titleTag = "hello Giliciński"
>>> titleTag
'hello Gilici\xc5\x84ski'
>>> type(titleTag)
<type 'str'>
>>>
>>> titleTag = titleTag.decode('utf8')
>>> titleTag
u'hello Gilici\u0144ski'
>>> type(titleTag)
<type 'unicode'>
>>>
>>> csvwriter.writerow([titleTag])
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0144' in position 12: ordinal not in range(128)
>>>
>>> # but this will work...
>>> csvwriter.writerow([titleTag.encode('utf8')])

The relevant Python documentation is here. Be sure to look at the examples, in particular the last one.

BTW, pyshell doesn't seem to accept non-ascii characters as input so use the normal Python interpretter.

mhawke
  • 84,695
  • 9
  • 117
  • 138
1

For IDLE, according to the solution here(link), open file $python/Lib/idellib/IOBinding.py, forcefully put

encoding = "utf-8"

after the try-except-pass module for setting locale. Close IDLE and save the file(perhaps requires administrative priority) and open IDLE again. At least it works for me. My IDLE version is 1.2, python: 2.5.

MeadowMuffins
  • 507
  • 1
  • 5
  • 20