1

I'm using python 3.5.2 and I'm trying to automatically open a url with parameters (many of them read from a csv-file). My problem is that one of the paramters contain the Norwegian letter "ø" in "Møre 2013" (see ...projects:"Møre%202013", where %20 is used to include a space between Møre and 2013) which causes an error message.

A bat-file runs lesurl.py with input-parameters from a csv-file. My code in the lesurl.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys              # for å kunne lese variable
import urllib.request   # for å lese url

fn=sys.argv[1]
minx = sys.argv[2]
miny = sys.argv[3]
maxx = sys.argv[4]
maxy = sys.argv[5]
pnavn = sys.argv[6]
c = sys.argv[7]
u = sys.argv[8]
pwd = sys.argv[9]
f = sys.argv[10] 

bestilling = urllib.request.urlopen('https://tjenester.norgeibilder.no/REST/StartExport.ashx?request={username:"'+u+'",password:"'+pwd+'",copyEmail:"",comment:"'+fn+'",coordInput:{type:"Polygon",coordinates:[[['+minx+','+maxy+'],['+maxx+','+maxy+'],['+maxx+','+miny+'],['+minx+','+miny+'],['+minx+','+maxy+']]},inputWkid:'+c+',cutNationalBorder:0,format:'+f+',resolution:'+r+',outputWkid:'+c+',fillColor:255,projects:"Møre%202013",imagemosaic:2}').read()
print(bestilling)

"Møre%202013" seems to cause an error:

Traceback (most recent call last):
  File "b_of_lesurl.py", line 28, in <module>
    bestilling = urllib.request.urlopen('https://tjenester.norgeibilder.no/REST/StartExport.ashx?request={username:"'+u+'",password:"'+pwd+'
",copyEmail:"",comment:"'+fn+'",coordInput:{type:"Polygon",coordinates:[[['+minx+','+maxy+'],['+maxx+','+maxy+'],['+maxx+','+miny+'],['+minx
+','+miny+'],['+minx+','+maxy+']]},inputWkid:'+c+',cutNationalBorder:0,format:'+f+',resolution:'+r+',outputWkid:'+c+',fillColor:255,projects
:"Møre%202013",imagemosaic:2}').read()

  File "C:\Users\ban\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 163, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\ban\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 466, in open
    response = self._open(req, data)
  File "C:\Users\ban\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 484, in _open
    '_open', req)
  File "C:\Users\ban\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 444, in _call_chain
    result = func(*args)
  File "C:\Users\ban\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1297, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "C:\Users\ban\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1254, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "C:\Users\ban\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1106, in request
    self._send_request(method, url, body, headers)
  File "C:\Users\ban\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1141, in _send_request
    self.putrequest(method, url, **skips)
  File "C:\Users\ban\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 983, in putrequest
    self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xf8' in position 344: ordinal not in range(128)

I have tried different variants of encode('utf-8') like (and using ...projects:"'+s+'",... in urlopen.

s="Møre%202013"
print(s)
s=s.encode('utf-8')
print(s)

giving

Møre%202013

b'M\xc3\xb8re%202013'
b'M\xc3\xb8re%202013'

and still giving the encode error. How do I include "ø" correctly? (Btw, e.g. "Oslo 2015" works fine.)

Gerhard
  • 22,678
  • 7
  • 27
  • 43
9ls1
  • 145
  • 7
  • You are building an URL by string concatenation. Don't do that, for the exact same reasons why you shouldn't build SQL or HTML with string concatenation: missing character escaping. Also, use a higher abstraction than `urllib`. There is the `requests` module, use this. It does all the escaping for you and is much easier to use than the low-level modules. – Tomalak May 14 '18 at 19:30
  • So a few things... I think you need to URL percent encode the string as the valid URL characters are very limited. Here is an example: it looks like your test already has a %20, which would be a space so the reas string I'm guessing is: s='Møre 2013' then percent encode: urllib.parse.quote(s) >>> 'M%C3%B8re%202013' : to decode it: urllib.parse.unquote('M%C3%B8re%202013') >>> 'Møre 2013' – sehafoc May 14 '18 at 19:32
  • @sehafoc: Thanks, this does the trick for time being (will check out \ requests` mentioned bu @Tomalak as well). I have included valid URL characters in my csv-file: `flatenr,llx,lly,urx,ury,prosjektnavn 04280594,647530,6783954,663618,6800042,"%C3%98stlandet%202013" 04320589,606175,6840380,622263,6856468,"Hedmark%20NORD%202015" 15340577,350730,6933982,366818,6950070,"M%C3%B8re%202013"` and the using `...projects:"'+s+'",...` in the urlopen-call. – 9ls1 May 14 '18 at 19:41
  • I'm looking for a way to add the "projects"-name in the urlopen-call from a file since I'm requesting like 20 images from the web site. The project-names include æ, ø and å and spaces, e.g. "Østlandet 2013", "Hedmark NORD 2015" and "Møre 2013". – 9ls1 May 14 '18 at 19:47
  • requests may be a lot easier... you'd typically use that for a more friendly interface. There are URL builder functions in urllib as well. Check here https://stackoverflow.com/questions/15799696/library-to-build-urls-in-python . You can add the URL quote line to one of the builder examples. – sehafoc May 14 '18 at 19:52

1 Answers1

2

The quick answer is that you need to percent encode the string

Here is an example:

>>> s='Møre 2013' 
>>> urllib.parse.quote(s) 
'M%C3%B8re%202013' 
>>> urllib.parse.unquote('M%C3%B8re%202013') 
'Møre 2013'

The longer answer is that valid URL characters are very limited

See this answer for more details https://stackoverflow.com/a/13500078/3776268

And also (from the linked answer) Explanations for why the characters are restricted are clearly spelled out in RFC-1738 (URLs) and RFC-2396 (URIs). Note the newer RFC-3986 (update to RFC-1738).

sehafoc
  • 866
  • 6
  • 9