0

I'm trying to get the absolute path but I dont get the correct result. This is I'm trying:

Given I have this html page url:

url1 = 'build/en/index.html'

and I have this relative path in the file:

url2  = '/pub-assets/css/indexen.css'

I'm doing:

urljoin(url1, url2)

So I should get build/pub-assets/css/indexen.css

but I don't get what is expected. Any suggestion much appreciated.

Milix
  • 107
  • 10
  • Possible duplicate of https://stackoverflow.com/questions/10893374/python-confusions-with-urljoin – Md Johirul Islam Apr 13 '18 at 21:14
  • @MdJohirulIslam I've already seen this post but it did not solve my issue – Milix Apr 13 '18 at 21:18
  • 1
    You are doing some other mistake. I run your code and get the correct path `https://example.com/en/pub-assets/css/indexen.css`. If you upload a little bit more code then that would help – Md Johirul Islam Apr 13 '18 at 21:21

2 Answers2

0

If your url1 is a file (instead of directory), you should modify the path by using urlparse and ParseResult._replace to modify the result.

from urlparse import urlsplit

url1 = 'https://example.com/en/index.html'
url2  = 'pub-assets/css/indexen.css'

p = urlsplit(url1).path
new_path = p[:p.rfind('/') + 1] + url2    #Gets the last directory and appends url
joined = urlsplit(url1)._replace(path=new_path)
print joined.geturl()  #Outputs https://example.com/en/pub-assets/css/indexen.css

This is assuming that url1 is an absolute path and url2 is a relative path.

Sunny Patel
  • 7,830
  • 2
  • 31
  • 46
  • url1 = 'build/en/index.html' --- url2 = '/pub-assets/css/indexen.css' it results in: build/en//pub-assets/css/indexen.css – Milix Apr 17 '18 at 00:37
  • @Milix This was using your question as it was originally asked without the leading slash in `url2`. If you will always have a leading slash with `url2`, then drop the trailing slash by removing the `+1` in the answer. – Sunny Patel Apr 17 '18 at 14:35
0

Python 3.6.1:

>>> u1 = 'https://example.com/en/index.html'
>>> u2 = 'pub-assets/css/indexen.css'
>>> import urllib.parse
>>> urllib.parse.urljoin(u1, u2)
'https://example.com/en/pub-assets/css/indexen.css'

Python 2.7.14:

>>> u1 = 'https://example.com/en/index.html'
>>> u2 = 'pub-assets/css/indexen.css'
>>> import urlparse
>>> urlparse.urljoin(u1, u2)
'https://example.com/en/pub-assets/css/indexen.css'

Note the changed import. I would double-check your Python version, import statement, and perhaps post more of your program.

sam
  • 366
  • 1
  • 11
  • I'd encourage you to try parsing the urls using some more specific methods in [urllib.parse](https://docs.python.org/3/library/urllib.parse.html). – sam Apr 17 '18 at 03:19
  • Also, using Python 3.6.4, I can `urljoin` your example URLs and I get `build/en/pub-assets/css/indexen.css`. What version of Python are you using? – sam Apr 17 '18 at 03:22