1

Ok, I know there are too many questions on this topic already; reading every one of those hasn't helped me solve my problem.

I have " hello'© " on my webpage. My objective is to get this content as json, strip the "hello" and write back the remaining contents ,i.e, "'©" back on the page.

I am using a CURL POST request to write back to the webpage. My code for getting the json is as follows:

request = urllib2.Request("http://XXXXXXXX.json")
user = 'xxx'
base64string = base64.encodestring('%s:%s' % (xxx, xxx))
request.add_header("Authorization", "Basic %s" % base64string)
result = urllib2.urlopen(request)   #send URL request
newjson = json.loads(result.read().decode('utf-8'))

At this point, my newres is unicode string. I discovered that my curl post request works only with percentage-encoding (like "%A3" for £).

What is the best way to do this? The code I wrote is as follows:

encode_dict = {'!':'%21',
               '"':'%22',
               '#':'%24',
               '$':'%25',
               '&':'%26',
               '*':'%2A',
               '+':'%2B',
               '@':'%40',
               '^':'%5E',
               '`':'%60',
               '©':'\xa9',
               '®':'%AE',
               '™':'%99',
               '£':'%A3'
              }
for letter in text1:
            print (letter)
            for keyz, valz in encode_dict.iteritems():
                if letter == keyz:
                    print(text1.replace(letter, valz))
                    path = "xxxx"
                    subprocess.Popen(['curl','-u', 'xxx:xxx', 'Content-Type: text/html','-X','POST','--data',"text="+text1, ""+path])

This code gives me an error saying " UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if letter == keyz:"

Is there a better way to do this?

koolkat
  • 706
  • 3
  • 8
  • 23
  • Is one of your arguments in `if letter == keyz:` a normal (non unicode) `string`? See http://stackoverflow.com/questions/18193305/python-unicode-equal-comparison-failed – Paul Rooney Feb 25 '15 at 01:38
  • Yes, 'keyz' and 'valz' are just strings. I tried converting them to unicode by doing : keyz = unicode(keys,'utf-8') and valz = unicode(valz, 'utf-8'). It gave me this error: TypeError: decoding Unicode is not supported – koolkat Feb 25 '15 at 01:58

1 Answers1

1

The problem was with the encoding. json.loads() returns a stream of bytes and needs to be decoded to unicode, using the decode() fucntion. Then, I replaced all non-ascii characters by encoding the unicode with ascii encoding using encode('ascii','xmlcharrefreplace').

newjson = json.loads(result.read().decode('utf-8').encode("ascii","xmlcharrefreplace"))

Also, learning unicode basics helped me a great deal! This is an excellent tutorial.

koolkat
  • 706
  • 3
  • 8
  • 23