71

My python (ver 2.7) script is running well to get some company name from local html files but when it comes to some specific country name, it gives this error "UnicodeEncodeError: 'ascii' codec can't encode character"

Specially getting error when this company name comes

Company Name: Kühlfix Kälteanlagen Ing.Gerhard Doczekal & Co. KG

The link cannot be processed

Traceback (most recent call last): 
  File "C:\Python27\Process2.py", line 261, in <module>
    flog.write("\nCompany Name: "+str(pCompanyName))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 9: ordinal not in range(128)

Error gives in this line of code:

if companyAlreadyKnown == 0:
   for hit in soup2.findAll("h1"):
       print "Company Name: "+hit.text
       pCompanyName = hit.text
       flog.write("\nCompany Name: "+str(pCompanyName))
       companyObj.setCompanyName(pCompanyName)
rhb1
  • 753
  • 1
  • 7
  • 8
  • Read http://bit.ly/unipain – Daenyth Jun 30 '15 at 13:55
  • 5
    Anybody coming here should visit http://stackoverflow.com/questions/3828723/why-should-we-not-use-sys-setdefaultencodingutf-8-in-a-py-script and http://stackoverflow.com/questions/28657010/dangers-of-sys-setdefaultencodingutf-8, doing what is suggested in the accepted is usually if not always a **very bad idea**. – Padraic Cunningham Aug 24 '16 at 22:25
  • whereever you are writing to a file or reading from a file, you have to add encoding. open("filename", "w", encoding=''UTF-8") – Reihan_amn Feb 21 '18 at 01:58

2 Answers2

238

Try setting the system default encoding as utf-8 at the start of the script, so that all strings are encoded using that.

Example -

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

The above should set the default encoding as utf-8 .

Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
  • now another error i am facing bro ! Traceback (most recent call last): File "C:\Python27\Process2.py", line 261, in print "Company Name: "+hit.text File "C:\Python27\lib\encodings\cp437.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\xae' in position 2 8: character maps to – rhb1 Jul 01 '15 at 05:51
  • You have changed the encoding to something else now - `charmap` so the issue occurs. – Anand S Kumar Jul 01 '15 at 08:23
  • You may be the most intelligent python dev I've ever encountered. Was it really that hard.. – kubudi Jan 23 '16 at 12:48
  • 11
    This works for Python 2.x, but it's not a great way to go about this and is deprecated in Python 3. Better to actually decode/encode the data properly. See discussion at [Why should we NOT use sys.setdefaultencoding(“utf-8”) in a py script](http://stackoverflow.com/questions/3828723/why-should-we-not-use-sys-setdefaultencodingutf-8-in-a-py-script) – MartyMacGyver Jun 27 '16 at 00:07
  • 5
    @MartyMacGyver, totally correct, this can break libraries and cause hard to find bugs. The fact that this is not even mentioned makes this a very dangerous answer. – Padraic Cunningham Aug 24 '16 at 22:26
  • Works like magic. Cant Thank Enough! – Rudresh Ajgaonkar Oct 06 '16 at 19:38
  • Ah after half an hour of shuffling encode and decode and str statements in my script. Thank you very much. – Siddhartha Oct 27 '16 at 06:37
  • I agree with the other commenters that this is a potentially dangerous answer and you should instead decode/encode properly. However if the error is being raised from a library you have no control over (in my case pyspark), this is a handy short term work around. – Kyle Kochis Jan 12 '17 at 22:33
  • 7
    This does not work: `module 'sys' has no attribute 'setdefaultencoding'` – Dima Lituiev Mar 02 '17 at 03:01
  • you're the man! – Arnaud Bouchot Mar 24 '17 at 16:26
  • AttributeError: module 'sys' has no attribute 'setdefaultencoding' – Yuseferi Oct 18 '17 at 18:12
  • 18
    using `export PYTHONIOENCODING=UTF-8` works for me – Yuseferi Oct 18 '17 at 22:37
  • OMG I'm getting the f****** error for 3 days, and couldn't deal with it until set your code... It was so frustrating... THANKS YOU VERY MUCH! – M. Mariscal Nov 18 '17 at 11:27
  • Nice and pretty simple, don't have to mess with encoding and decoding. – Krishan Kumar Mourya Jun 08 '18 at 08:11
38

You really want to do this

flog.write("\nCompany Name: "+ pCompanyName.encode('utf-8'))

This is the "encode late" strategy described in this unicode presentation (slides 32 through 35).