3

For example, I have the phrase

Olá mundo!

and I need it written as

ol%E1%20mundo!

to create my urls.

I need this to connect to some specif sites who uses this kind of encode.

An example of site who use this type of encode is the Michaellis translator.

How can I create urls with these type of encode using python? I tryed use urllib and urllib2 but I hadn't sucess until now.

Here is another question of mine, related to this.

Appreciate the help, thanks.

Community
  • 1
  • 1
GarouDan
  • 3,743
  • 9
  • 49
  • 75

1 Answers1

1

Here is one way using urllib's quote and string's encode("utf8"):

In  [1]: url = u'Ol\xe1 mundo!'

In  [2]: url.encode("utf8")
Out [2]: 'Ol\xc3\xa1 mundo!'

In  [3]: print url.encode("utf8")
Olá mundo!

In  [4]: urllib.quote(url.encode("utf8"))
Out [4]: 'Ol%C3%A1%20mundo%21'

In  [5]: print urllib.quote(url.encode("utf8"))
Ol%C3%A1%20mundo%21

In  [6]: urllib.unquote(urllib.quote(url.encode("utf8")))
Out [6]: 'Ol\xc3\xa1 mundo!'

In  [7]: print urllib.unquote(urllib.quote(url.encode("utf8")))
Olá mundo!
chown
  • 51,908
  • 16
  • 134
  • 170
  • This was very interesting @chown. But I need something more. See the *olá* appears as *ol%E1*. This encode using *%E1*is very strange I didn't see it before. Can we performe this using urllib? – GarouDan Nov 21 '11 at 00:03
  • You might be able to. I just tried a few things but was only able to do it manually by taking `u'\xe1'` and removing `u'\x` which is not a good idea at all =p. Check out this though: http://www.utf8-chartable.de/ - It shows that `U+00E1` is equivalent to `c3 a1`. – chown Nov 21 '11 at 00:28
  • It's true `U+00E1` is equivalent to `c3 a1`. I tried put it on the site but `Ol%c3%a1` don't works, in the url appears `olá` (fine) but in the results I get `Olá` (bad =/). `ol\xc3\xa1` and `ol\xE1` don't works too. Looks like just `ol%E1` works ok... If I use `olá` as input...how can I arrive in `ol%e1` as output to do a url =/? Thx for help. – GarouDan Nov 22 '11 at 10:43
  • A example site who uses this type of encoding is [here](http://michaelis.uol.com.br/moderno/ingles/index.php?lingua=portugues-ingles&palavra=ol%E1). – GarouDan Nov 22 '11 at 10:47
  • @G I'll try a few things and get back to ya. – chown Nov 22 '11 at 17:02
  • Interesting improve chown. Now we can print nice. Unfortunally didn't worked in the site because it just recognize things like `ol%E1`. But in this [code](http://pastebin.com/rFh0d9CA), I didn't found yet, probably we can understood how it transforms `olá` into 'ol%E1'. The site is [michaelis.uol.com.br](http://michaelis.uol.com.br/moderno/portugues/index.php?lingua=portugues-portugues&palavra=ol%E1). I would like to say that I want this to create a terminal translation for my use only (and I have one, but words with no ASCII characteres don't works). – GarouDan Jun 11 '12 at 15:12