0

I need to perform google search to retrieve the number of results for a query. I found the answer here - Google Search from a Python App

However, for few queries I am getting the below error. I think the query has unicode characters.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 28: ordinal not in range(128)

I searched google and found I need to convert unicode to ascii, and found below code.

def convertToAscii(text, action):
            temp = unicode(text, "utf-8")
            fixed = unicodedata.normalize('NFKD', temp).encode('ASCII', action)
            return fixed
    except Exception, errorInfo:
            print errorInfo
            print "Unable to convert the Unicode characters to xml character entities"
            raise errorInfo

If I use the action ignore, it removes those characters, but if I use other actions, I am getting exceptions.

Any idea, how to handle this?

Thanks

== Edit == I am using below code to encode and then perform the search and this is throwing the error.

query = urllib.urlencode({'q': searchfor})

Community
  • 1
  • 1
Boolean
  • 14,266
  • 30
  • 88
  • 129

2 Answers2

2

You cannot urlencode raw Unicode strings. You need to first encode them to UTF-8 and then feed to it:

query = urllib.urlencode({'q': u"München".encode('UTF-8')})

This returns q=M%C3%BCnchen which Google happily accepts.

9000
  • 39,899
  • 9
  • 66
  • 104
0

You can't safely convert Unicode to ASCII. Doing so involves throwing away information (specifically, it throws away non-English letters).

You should be doing the entire process in Unicode, so as not to lose any information.

Turtle
  • 1,320
  • 10
  • 11
  • I am using query = urllib.urlencode({'q': searchfor}) and this is throwing error. Is there anyway to perform search on unicode itself. – Boolean Jan 24 '11 at 01:45