1

Introductory tags: python | python-3.x | url | parameters | urlopen

Used language: Python 3.x

Used modules: urlopen | urllib.request

Status: Not yet resolved

Description of the problem:

I have url:

 http://mapy.cz/#mm=TTtP@x=133054720@y=135947904@z=13

and it redirects me (in web browser) to another url:

 https://mapy.cz/zakladni?x=14.412346408814274&y=50.08612581835152&z=13

I want to get the parameters x and y from the path.

 x = 14.412346408814274
 y = 50.08612581835152

(Geographic coordinates in decimal degrees.)

When I use:

 from urllib.request import urlopen

 url = "http://mapy.cz/#mm=TTtP@x=133168128@y=133141248@z=13"
 print(urlopen(url).url)

It will return me:

 https://mapy.cz/

When I use:

with urlopen(url) as conn:
    newUrl = conn.geturl()        
    print (newUrl)

It will return me:

 https://mapy.cz/        

When I use:

with urlopen(url) as conn:
    print (conn.info())

It will return me:

Server: nginx
Date: Sun, 03 Jun 2018 23:24:31 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: close
Cache-Control: max-age=0
Expires: Sun, 03 Jun 2018 23:24:31 GMT
Strict-Transport-Security: max-age=31536000

When I use:

with urlopen(url) as conn:
    print (conn.__dict__)

It will return me:

{'fp': <_io.BufferedReader name=1404>, 'debuglevel': 0, '_method': 'GET', 'headers': <http.client.HTTPMessage object at 0x000000000DD48518>, 'msg': 'OK', 'version': 11, 'status': 200, 'reason': 'OK', 'chunked': True, 'chunk_left': None, 'length': None, 'will_close': True, 'code': 200, 'url': 'https://mapy.cz/'}

There is no mention of the parameters/path behind the slash. Neither the original url nor the following url.

When I use code from What is the quickest way to HTTP GET in Python?:

import urllib.request
contents = urllib.request.urlopen("url").read()

It will return me:

'raw html...'

I don't want to open/download the html and mining those parameters from html.

Stilgar Dragonclaw
  • 127
  • 1
  • 1
  • 11
  • In general, if uropen, requests, etc. acts differently than your browser, I'd first suspect the User-Agent header -- just an idea. – jedwards Jun 04 '18 at 01:37

1 Answers1

0

If you use requests (which you should) here is the solution for your specific URL:

import requests

URL = 'http://mapy.cz/#mm=TTtP@x=133054720@y=135947904@z=13'

response = requests.get(URL)

if response.history:
    print("Request was redirected")
    for resp in response.history:
        print((resp.status_code, resp.url,))
        # getting x, y and z
        parts = resp.url.split('@')
        x = parts[1].split('=')[1]
        y = parts[2].split('=')[1]
        z = parts[3].split('=')[1]
        print("X {} Y {} Z {}".format(x, y, z))

    print("Final destination:")
    print((response.status_code, response.url,))
else:
    print("Request was not redirected")
Pablo Santa Cruz
  • 176,835
  • 32
  • 241
  • 292
  • It doesn't return what OP expected (it only splits the same thing in the input URL). – cs95 Jun 04 '18 at 01:24
  • I ran it and returned the exact values for `x, y, z`. What values did you get? – Pablo Santa Cruz Jun 04 '18 at 01:28
  • 1
    I got this: `X 133054720 Y 135947904 Z 13` – cs95 Jun 04 '18 at 01:31
  • This code return me: Request was redirected (301, 'http://mapy.cz/#mm=TTtP@x=133054720@y=135947904@z=13') X 133054720 Y 135947904 Z 13 Final destination: (200, 'https://mapy.cz/') http://mapy.cz/#mm=TTtP@x=133054720@y=135947904@z=13 *response.status_code* changes (301 -> 200) but path behind the slash is the same. – Stilgar Dragonclaw Jun 04 '18 at 10:43