That' because urllib.parse.urljoin
is not made for this use case.
Example from the docs (https://docs.python.org/fr/3/library/urllib.parse.html#module-urllib.parse):
from urllib.parse import urljoin
new_url = urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
print(new_url)
Output:
http://www.cwi.nl/%7Eguido/FAQ.html
As written in the doc, urllib.parse.urljoin
constructs
a full ("absolute") URL by combining a "base URL" (base) with another
URL (url).
In your example, you give "https://test.com/endpoint" as first parameter, so urllib.parse.urljoin
will consider that the "base url" is "https://test.com/", and it will add what you pass as a second parameter (that is "test.php"), that's why your output is "https://test.com/test.php".
I think that you best option is to use the joinurl
function posted by @tripleee, because it will not produce results like "endpoint//test.php" or "endpointtest.php".
But you should not use os.path.join
if your code has to be cross platform. On Windows, you will get a backslash instead of a slash ("https://test.com/endpoint\test.php").
Here is a code sample for testing purposes:
def joinurl(baseurl, path):
return '/'.join([baseurl.rstrip('/'), path.lstrip('/')])
url_base = "https://test.com/endpoint"
web_page_name = "/test.php"
desired_output = "https://test.com/endpoint/test.php"
assert(joinurl("https://test.com/endpoint", "test.php") == desired_output)
assert(joinurl("https://test.com/endpoint/", "test.php") == desired_output)
assert(joinurl("https://test.com/endpoint", "/test.php") == desired_output)
assert(joinurl("https://test.com/endpoint/", "/test.php") == desired_output)