42

What is the preferred solution for checking if an URL is relative or absolute?

Geo
  • 93,257
  • 117
  • 344
  • 520

3 Answers3

71

Python 2

You can use the urlparse module to parse an URL and then you can check if it's relative or absolute by checking whether it has the host name set.

>>> import urlparse
>>> def is_absolute(url):
...     return bool(urlparse.urlparse(url).netloc)
... 
>>> is_absolute('http://www.example.com/some/path')
True
>>> is_absolute('//www.example.com/some/path')
True
>>> is_absolute('/some/path')
False

Python 3

urlparse has been moved to urllib.parse, so use the following:

from urllib.parse import urlparse

def is_absolute(url):
    return bool(urlparse(url).netloc)
Kukanani
  • 718
  • 1
  • 6
  • 22
Lukáš Lalinský
  • 40,587
  • 6
  • 104
  • 126
  • 4
    Shouldn't `www.example.com/some/path` count as abolute too? – Geo Dec 02 '11 at 14:21
  • 5
    Officially, that's an relative URL with the whole string as path. If you want it to count as absolute, you would have to either add the `http://` by some pre-processing or not use `urlparse`. – Lukáš Lalinský Dec 02 '11 at 14:26
  • 4
    According to RFC `//google.com` is a protocol-relative url. And your code will return `False` for it. – Nik Feb 13 '14 at 15:07
  • I'd prefer `urlsplit` instead of `urlparse`. BTW, in Django you have a Python 2 & 3 compatible way: `from django.utils.six.moves.urllib.parse import urlsplit, urlparse` – Rockallite Aug 21 '17 at 07:19
  • If you want Python 2 & 3 compatibility just use six module (`six.moves.urllib.parse`) -> https://pythonhosted.org/six/#module-six.moves.urllib.parse – mateuszb Sep 24 '17 at 10:23
  • 1
    @Nik not for me: In [27]: urlparse('//google.com') Out[27]: ParseResult(scheme='', netloc='google.com', path='', params='', query='', fragment='') – Sean Aug 06 '21 at 11:41
29

If you want to know if an URL is absolute or relative in order to join it with a base URL, I usually do urllib.parse.urljoin anyway:

>>> from urllib.parse import urljoin
>>> urljoin('http://example.com/', 'http://example.com/picture.png')
'http://example.com/picture.png'
>>> urljoin('http://example1.com/', '/picture.png')
'http://example1.com/picture.png'
>>> 
Bob Whitelock
  • 167
  • 3
  • 12
warvariuc
  • 57,116
  • 41
  • 173
  • 227
  • 4
    It turns out that this is what I wanted to do - it treats the first URL as the default for all unspecified parts of the second URL. If the second one is absolute, it just uses that one. – rescdsk Oct 21 '13 at 15:22
  • 2
    Anyone using this should be aware that if given `http://www.yahoo.com` and `www.google.com` as inputs, this will give you `http://www.yahoo.com/www.google.com` as output, which probably isn't what you wanted. So you'll still have to check somehow if the second one is a url without a schema, or if actually a relative path. – J. Taylor Feb 02 '19 at 06:36
2

Can't comment accepted answer, so write this comment as new answer: IMO checking scheme in accepted answer ( bool(urlparse.urlparse(url).scheme) ) is not really good idea because of http://example.com/file.jpg, https://example.com/file.jpg and //example.com/file.jpg are absolute urls but in last case we get scheme = ''

I use this code:

is_absolute = True if '//' in my_url else False

  • 2
    AFAIK //foo/bar is a valid relative URL. With "relative" meaning "without scheme and netloc". – guettli Dec 18 '17 at 10:39