164

I have two urls:

url1 = "http://127.0.0.1/test1/test2/test3/test5.xml"
url2 = "../../test4/test6.xml"

How can I get an absolute url for url2?

ndmeiri
  • 4,979
  • 12
  • 37
  • 45
Bdfy
  • 23,141
  • 55
  • 131
  • 179
  • 2
    possible duplicate of [How to join components of a path when you are constructing a URL in Python](http://stackoverflow.com/questions/1793261/how-to-join-components-of-a-path-when-you-are-constructing-a-url-in-python) – Mirzhan Irkegulov Oct 09 '13 at 07:41
  • Related: [Joining url path components intelligently](http://codereview.stackexchange.com/questions/13027/joining-url-path-components-intelligently) – kojiro Jan 15 '14 at 15:42

6 Answers6

312

You should use urlparse.urljoin :

>>> import urlparse
>>> urlparse.urljoin(url1, url2)
'http://127.0.0.1/test1/test4/test6.xml'

With Python 3 (where urlparse is renamed to urllib.parse) you could use it as follow:

>>> import urllib.parse
>>> urllib.parse.urljoin(url1, url2)
'http://127.0.0.1/test1/test4/test6.xml'
Cédric Julien
  • 78,516
  • 15
  • 127
  • 132
  • 10
    How we use `urljoin` with 3 or mode parameters or which library do you recommend for this? – Mesut Tasci Apr 23 '13 at 00:13
  • @mesuutt try to make a loop and join each part with the previously joined URL. – Cédric Julien Apr 23 '13 at 07:35
  • 3
    @CédricJulien: a simple loop will not work, as any path with a leading `/` will "reset" and return scheme + netloc + lasturl: `urlparse.urljoin('http://www.a.com/b/c/d', '/e') => 'http://www.a.com/e'` – MestreLion Nov 05 '13 at 17:12
  • If using the urljoin, there's a problem. For example, `urljoin('http://www.a.com/', '../../b/c.png')`, the result is `'http://www.a.com/../../b/c.png'`, but not `http://www.a.com/b/c.png`. So, is there any method to get `http://www.a.com/b/c.png`? – bigwind Jul 02 '14 at 03:54
  • 1
    Link to Python 3 documentation points to Python 2 documentation, it needs to updated in the answer, it is https://docs.python.org/3.6/library/urllib.parse.html#urllib.parse.urljoin – Harsh Jun 08 '17 at 06:55
  • For `urljoin` with 3 or more parts, see my answer below: https://stackoverflow.com/a/54217907/319397 – pcv Jan 16 '19 at 13:19
  • @bigwind — In last Python version (I tested with 3.8) `urljoin` works correctly with your case – DemX86 Dec 04 '19 at 11:45
22

If your relative path consists of multiple parts, you have to join them separately, since urljoin would replace the relative path, not join it. The easiest way to do that is to use posixpath.

>>> import urllib.parse
>>> import posixpath
>>> url1 = "http://127.0.0.1"
>>> url2 = "test1"
>>> url3 = "test2"
>>> url4 = "test3"
>>> url5 = "test5.xml"
>>> url_path = posixpath.join(url2, url3, url4, url5)
>>> urllib.parse.urljoin(url1, url_path)
'http://127.0.0.1/test1/test2/test3/test5.xml'

See also: How to join components of a path when you are constructing a URL in Python

pcv
  • 2,121
  • 21
  • 25
15

You can use reduce to achieve Shikhar's method in a cleaner fashion.

>>> import urllib.parse
>>> from functools import reduce
>>> reduce(urllib.parse.urljoin, ["http://moc.com/", "path1/", "path2/", "path3/"])
'http://moc.com/path1/path2/path3/'

Note that with this method each fragment should have trailing forward-slash, with no leading forward-slash, to indicate it is a path fragment being joined.

This is more correct/informative, telling you that path1/ is a URI path fragment, and not the full path (e.g. /path1/) or an unknown (e.g. path1). An unknown could be either, but they are handled as a full path.

If you need to add / to a fragment lacking it, you could do:

uri = uri if uri.endswith("/") else f"{uri}/"

To learn more about URI resolution, Wikipedia has some nice examples.

Updates

  • Just noticed Peter Perron commented about reduce on Shikhar's answer, but I'll leave this here then to demonstrate how that's done.

  • Updated wikipedia URL

Jonathan Rioux
  • 1,067
  • 2
  • 14
  • 30
ryanjdillon
  • 17,658
  • 9
  • 85
  • 110
12

For python 3.0+ the correct way to join urls is:

from urllib.parse import urljoin
urljoin('https://10.66.0.200/', '/api/org')
# output : 'https://10.66.0.200/api/org'
srth12
  • 873
  • 9
  • 16
11
es = ['http://127.0.0.1', 'test1', 'test4', 'test6.xml']
base = ''
map(lambda e: urlparse.urljoin(base, e), es)
Shikhar Mall
  • 159
  • 2
  • 2
  • 10
    Good way to support a list of values. You can remove your side effect (your "base" variable) by using a reduce though. `reduce(lambda a, b: urlparse.urljoin(a, b), es)` A map is `list[n] - to -> list[n]` A reduce is `list[n] - to -> a calculated value` – Peter Perron Apr 09 '18 at 13:16
4
>>> from urlparse import urljoin
>>> url1 = "http://www.youtube.com/user/khanacademy"
>>> url2 = "/user/khanacademy"
>>> urljoin(url1, url2)
'http://www.youtube.com/user/khanacademy'

Simple.

Talha Ashraf
  • 1,201
  • 2
  • 14
  • 19