The urlparse
library, found in urllib
in Python3, is designed for this. Example adapted from the documentation:
>>> from urllib.parse import urlparse
>>> o = urlparse('https://api.somedomain.co.uk/api/addresses?postcode=XXSDF&houseNo=34')
>>> o
ParseResult(scheme='https', netloc='api.somedomain.co.uk', path='/api/addresses', params='', query='postcode=XXSDF&houseNo=34', fragment='')
>>> o.scheme
'http'
>>> o.port
None
>>> o.geturl()
'https://api.somedomain.co.uk/api/addresses?postcode=XXSDF&houseNo=34'
In order to get host, path and query, the API is straighforward:
>>> print(o.hostname, o.path, o.query)
Returns:
api.somedomain.co.uk /api/addresses postcode=XXSDF&houseNo=34
In order to get the subdomain itself, the only way seems to split by .
.
Note that the urllib.parse.urlsplit
should be used instead urlparse
, according to the documentation:
This should generally be used instead of urlparse(https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlsplit) if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted