python urljoin not finding the absolute path

Question

I'm trying to get the absolute path but I dont get the correct result. This is I'm trying:

Given I have this html page url:

url1 = 'build/en/index.html'

and I have this relative path in the file:

url2  = '/pub-assets/css/indexen.css'

I'm doing:

urljoin(url1, url2)

So I should get build/pub-assets/css/indexen.css

but I don't get what is expected. Any suggestion much appreciated.

Possible duplicate of https://stackoverflow.com/questions/10893374/python-confusions-with-urljoin — Md Johirul Islam, Apr 13 '18 at 21:14
@MdJohirulIslam I've already seen this post but it did not solve my issue — Milix, Apr 13 '18 at 21:18
You are doing some other mistake. I run your code and get the correct path `https://example.com/en/pub-assets/css/indexen.css`. If you upload a little bit more code then that would help — Md Johirul Islam, Apr 13 '18 at 21:21

score 0 · Answer 1 · answered Apr 13 '18 at 21:22

0

If your url1 is a file (instead of directory), you should modify the path by using urlparse and ParseResult._replace to modify the result.

from urlparse import urlsplit

url1 = 'https://example.com/en/index.html'
url2  = 'pub-assets/css/indexen.css'

p = urlsplit(url1).path
new_path = p[:p.rfind('/') + 1] + url2    #Gets the last directory and appends url
joined = urlsplit(url1)._replace(path=new_path)
print joined.geturl()  #Outputs https://example.com/en/pub-assets/css/indexen.css

This is assuming that url1 is an absolute path and url2 is a relative path.

answered Apr 13 '18 at 21:22

Sunny Patel

7,830
2
31
46

url1 = 'build/en/index.html' --- url2 = '/pub-assets/css/indexen.css' it results in: build/en//pub-assets/css/indexen.css – Milix Apr 17 '18 at 00:37
@Milix This was using your question as it was originally asked without the leading slash in `url2`. If you will always have a leading slash with `url2`, then drop the trailing slash by removing the `+1` in the answer. – Sunny Patel Apr 17 '18 at 14:35

score 0 · Answer 2 · answered Apr 13 '18 at 21:53

0

Python 3.6.1:

>>> u1 = 'https://example.com/en/index.html'
>>> u2 = 'pub-assets/css/indexen.css'
>>> import urllib.parse
>>> urllib.parse.urljoin(u1, u2)
'https://example.com/en/pub-assets/css/indexen.css'

Python 2.7.14:

>>> u1 = 'https://example.com/en/index.html'
>>> u2 = 'pub-assets/css/indexen.css'
>>> import urlparse
>>> urlparse.urljoin(u1, u2)
'https://example.com/en/pub-assets/css/indexen.css'

Note the changed import. I would double-check your Python version, import statement, and perhaps post more of your program.

answered Apr 13 '18 at 21:53

sam

366
1
11

I'd encourage you to try parsing the urls using some more specific methods in [urllib.parse](https://docs.python.org/3/library/urllib.parse.html). – sam Apr 17 '18 at 03:19
Also, using Python 3.6.4, I can `urljoin` your example URLs and I get `build/en/pub-assets/css/indexen.css`. What version of Python are you using? – sam Apr 17 '18 at 03:22

python urljoin not finding the absolute path

2 Answers2