-2

I'm trying to split a url into parts so that I can work with these separately.

For e.g. the url:

'https://api.somedomain.co.uk/api/addresses?postcode=XXSDF&houseNo=34'

How can I split this into: 1) the source/origin (i.e. protocol + subdomain + domain) 2) path '/api/addresses' 3) Query: '?postcode=XXSDF&houseNo=34'

Yunti
  • 6,761
  • 12
  • 61
  • 106

3 Answers3

2

You can just use python's urlparse.

>>> from urlparse import urlparse
>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
>>> o   
ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
            params='', query='', fragment='')
>>> o.scheme
'http'
>>> o.port
80
>>> o.geturl()
'http://www.cwi.nl:80/%7Eguido/Python.html'
Games Brainiac
  • 80,178
  • 33
  • 141
  • 199
1

The urlparse library, found in urllib in Python3, is designed for this. Example adapted from the documentation:

>>> from urllib.parse import urlparse
>>> o = urlparse('https://api.somedomain.co.uk/api/addresses?postcode=XXSDF&houseNo=34')
>>> o   
ParseResult(scheme='https', netloc='api.somedomain.co.uk', path='/api/addresses', params='', query='postcode=XXSDF&houseNo=34', fragment='')
>>> o.scheme
'http'
>>> o.port
None
>>> o.geturl()
'https://api.somedomain.co.uk/api/addresses?postcode=XXSDF&houseNo=34'

In order to get host, path and query, the API is straighforward:

>>> print(o.hostname, o.path, o.query)

Returns:

api.somedomain.co.uk /api/addresses postcode=XXSDF&houseNo=34

In order to get the subdomain itself, the only way seems to split by ..


Note that the urllib.parse.urlsplit should be used instead urlparse, according to the documentation:

This should generally be used instead of urlparse(https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlsplit) if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted

Community
  • 1
  • 1
aluriak
  • 5,559
  • 2
  • 26
  • 39
0

You probably want the stdlib module urlparse on Python 2, or urllib.parse on Python 3. This will split the URL up more finely than you're asking for, but it's not difficult to put the pieces back together again.

Gareth McCaughan
  • 19,888
  • 1
  • 41
  • 62