2

I cannot request url "http://www.besondere-raumdüfte.de" with urllib2.urlopen().
I tried to encode string using urllib.urlencode with utf-8, idna, ascii But still doesn't work.
Raises URLError: <urlopen error unknown url type.

  • 1
    "ü" is not a "non-unicode character". Barely *any character* qualifies as a "non-unicode character", because Unicode covers pretty much everything character-like out there. It's a "non-ASCII character". – Joachim Sauer Mar 27 '12 at 10:11

2 Answers2

2

What you need is u"http://www.besondere-raumdüfte.de/".encode('idna'). Please note how the source string is a Unicode constant (the u prefix).

The result is an URL usable with urlopen().

If you have a domain name with non-ASCII characters and the rest of the URL contains non-ASCII characters, you need to .encode('idna') the domain part and iri2uri() the rest.

9000
  • 39,899
  • 9
  • 66
  • 104
  • Thanks that works, I remember I tried to encode with 'idna' and urllib2.urlopen the string **without u at the beginning** but does 'u' before string matter smth. – Abdyresul Charyev Mar 28 '12 at 09:58
  • @AbdyresulCharyev: oh yes, using a byte string instead of a Unicode sting is one of the most frequent mistakes, I did it myself many times %) – 9000 Mar 28 '12 at 18:18
0

You are working with an iri and not a uri, what you have to do is convert it correctly. The following is an example on how to do it:

from httplib2 import iri2uri

def iri_to_uri(iri):
    """Transform a unicode iri into a ascii uri."""
    if not isinstance(iri, unicode):
        raise TypeError('iri %r should be unicode.' % iri)
    return bytes(iri2uri(iri))

Once you have an uri you should be able to use urllib2.

mandel
  • 2,921
  • 3
  • 23
  • 27
  • To check I tried `urllib2.urlopen(bytes(iri2uri("http://www.besondere-raumdüfte.de")))` But gives the error: URLError: – Abdyresul Charyev Mar 27 '12 at 10:12
  • You are probably behind a firewall, take a look at http://stackoverflow.com/questions/4847649/opening-websites-using-urllib2-from-behind-corporate-firewall-11004-getaddrinf – mandel Mar 27 '12 at 10:33
  • Funny, but it is because I forgot to put u before string quot :) after I put it it worked. I guess u defines the string as unicode string. Thanks. – Abdyresul Charyev Mar 28 '12 at 11:32