I cannot request url "http://www.besondere-raumdüfte.de" with urllib2.urlopen().
I tried to encode string using urllib.urlencode with utf-8, idna, ascii But still doesn't work.
Raises URLError: <urlopen error unknown url type
.
Asked
Active
Viewed 2,184 times
2

Abdyresul Charyev
- 58
- 1
- 8
-
1"ü" is not a "non-unicode character". Barely *any character* qualifies as a "non-unicode character", because Unicode covers pretty much everything character-like out there. It's a "non-ASCII character". – Joachim Sauer Mar 27 '12 at 10:11
2 Answers
2
What you need is u"http://www.besondere-raumdüfte.de/".encode('idna')
. Please note how the source string is a Unicode constant (the u
prefix).
The result is an URL usable with urlopen()
.
If you have a domain name with non-ASCII characters and the rest of the URL contains non-ASCII characters, you need to .encode('idna')
the domain part and iri2uri()
the rest.

9000
- 39,899
- 9
- 66
- 104
-
Thanks that works, I remember I tried to encode with 'idna' and urllib2.urlopen the string **without u at the beginning** but does 'u' before string matter smth. – Abdyresul Charyev Mar 28 '12 at 09:58
-
@AbdyresulCharyev: oh yes, using a byte string instead of a Unicode sting is one of the most frequent mistakes, I did it myself many times %) – 9000 Mar 28 '12 at 18:18
0
You are working with an iri and not a uri, what you have to do is convert it correctly. The following is an example on how to do it:
from httplib2 import iri2uri
def iri_to_uri(iri):
"""Transform a unicode iri into a ascii uri."""
if not isinstance(iri, unicode):
raise TypeError('iri %r should be unicode.' % iri)
return bytes(iri2uri(iri))
Once you have an uri you should be able to use urllib2.

mandel
- 2,921
- 3
- 23
- 27
-
To check I tried `urllib2.urlopen(bytes(iri2uri("http://www.besondere-raumdüfte.de")))` But gives the error: URLError:
– Abdyresul Charyev Mar 27 '12 at 10:12 -
You are probably behind a firewall, take a look at http://stackoverflow.com/questions/4847649/opening-websites-using-urllib2-from-behind-corporate-firewall-11004-getaddrinf – mandel Mar 27 '12 at 10:33
-
Funny, but it is because I forgot to put u before string quot :) after I put it it worked. I guess u defines the string as unicode string. Thanks. – Abdyresul Charyev Mar 28 '12 at 11:32