splitting a full url into parts

Question

I'm trying to split a url into parts so that I can work with these separately.

For e.g. the url:

'https://api.somedomain.co.uk/api/addresses?postcode=XXSDF&houseNo=34'

How can I split this into: 1) the source/origin (i.e. protocol + subdomain + domain) 2) path '/api/addresses' 3) Query: '?postcode=XXSDF&houseNo=34'

score 2 · Answer 1 · answered May 23 '16 at 15:13

You can just use python's urlparse.

>>> from urlparse import urlparse
>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
>>> o   
ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
            params='', query='', fragment='')
>>> o.scheme
'http'
>>> o.port
80
>>> o.geturl()
'http://www.cwi.nl:80/%7Eguido/Python.html'

score 1 · Accepted Answer · edited May 23 '17 at 12:24

The urlparse library, found in urllib in Python3, is designed for this. Example adapted from the documentation:

>>> from urllib.parse import urlparse
>>> o = urlparse('https://api.somedomain.co.uk/api/addresses?postcode=XXSDF&houseNo=34')
>>> o   
ParseResult(scheme='https', netloc='api.somedomain.co.uk', path='/api/addresses', params='', query='postcode=XXSDF&houseNo=34', fragment='')
>>> o.scheme
'http'
>>> o.port
None
>>> o.geturl()
'https://api.somedomain.co.uk/api/addresses?postcode=XXSDF&houseNo=34'

In order to get host, path and query, the API is straighforward:

>>> print(o.hostname, o.path, o.query)

Returns:

api.somedomain.co.uk /api/addresses postcode=XXSDF&houseNo=34

In order to get the subdomain itself, the only way seems to split by ..

Note that the urllib.parse.urlsplit should be used instead urlparse, according to the documentation:

This should generally be used instead of urlparse(https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlsplit) if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted

Thanks - I love how python has a tool for everything. – Yunti May 23 '16 at 15:35 — Yunti, May 23 '16 at 15:35

score 0 · Answer 3 · answered May 23 '16 at 15:12

0

You probably want the stdlib module urlparse on Python 2, or urllib.parse on Python 3. This will split the URL up more finely than you're asking for, but it's not difficult to put the pieces back together again.

answered May 23 '16 at 15:12

Gareth McCaughan

19,888
1
41
62

splitting a full url into parts

3 Answers3