2

I have an Android app which uses

URLEncoder.encode(S.getSongArtist(),"UTF-8")

to encode a unicode string that is posted to a AppEngine python (2.7) web service. On the service I use

urllib.unquote_plus(artist)

This is not giving me correct results. I have an input like this:

Marie+Lafor%C3%AAt

which is unquote'd to

Marie Laforêt

If I use a javascript url decode, for instance: http://meyerweb.com/eric/tools/dencoder/ I get

Marie Laforêt

A correct result.

I tried using

urllib.unquote(artist).decode('utf-8') 

but this generates an exception. Any hints at all are greatly appreciated.

EDIT

Taxellool had the right answer in the comments:

what you are trying to decode is already decoded. try this:

urllib.unquote_plus(artist.encode('utf-8')).decode('utf-8')
scratchy
  • 51
  • 1
  • 7
  • what exception do you get in last `urllib.unquote(artist).decode('utf-8') `? it seems to work correctly under python2.7.5 – ymonad Mar 18 '14 at 06:44
  • if I use decode at the end i get: UnicodeEncodeError: 'ascii' codec can't encode characters in position 11-12: ordinal not in range(128) – scratchy Mar 18 '14 at 10:14
  • 1
    what you are trying to decode is already decoded. try this: `urllib.unquote_plus(artist.encode('utf-8')).decode('utf-8')` – Taxellool Mar 18 '14 at 10:21
  • I tried this in a python shell (as Taxellool did) and it works: >>> print urllib.unquote_plus(artist).decode('utf-8') Marie Laforêt On the server this same line of code generates the exception – scratchy Mar 18 '14 at 10:26
  • Taxellool - you are right - I tried the unquote(encode).decode and that works - thanks! – scratchy Mar 18 '14 at 10:34
  • 1
    @Taxellool: it should be `urllib.unquote_plus(artist.encode('ascii')).decode('utf-8')` – jfs Mar 18 '14 at 11:13

2 Answers2

3

Taxellool had the right answer in the comments:

what you are trying to decode is already decoded. try this:

urllib.unquote_plus(artist.encode('utf-8')).decode('utf-8')
scratchy
  • 51
  • 1
  • 7
1

I guess you are decoding before urllib.unquote():

>>> print urllib.unquote_plus('Marie+Lafor%C3%AAt'.decode('utf-8'))  
Marie Laforêt

If you decode after unquote, result would be what you want:

>>> print urllib.unquote_plus('Marie+Lafor%C3%AAt').decode('utf-8')  
Marie Laforêt

Just make sure you don't pass a unicode to urllib.unquote_plus.

Taxellool
  • 4,063
  • 4
  • 21
  • 38
  • On the server this line of code generates an exception: UnicodeEncodeError: 'ascii' codec can't encode characters in position 11-12: ordinal not in range(128) – scratchy Mar 18 '14 at 10:29
  • @scratchy: See [How to print() a string in Python3?](http://stackoverflow.com/q/22494825/4279). My answer works on Python 2 too. – jfs Mar 19 '14 at 09:45