18

I am trying to build a URL by joining some dynamic components. I thought of using something like os.path.join() BUT for URLs in my case. From research I found urlparse.urljoin() does the same thing. However, it looks like it only take two arguments at one time.

I have the following so far which works but looks repetitive:

    a = urlparse.urljoin(environment, schedule_uri)
    b = urlparse.urljoin(a, str(events_to_hours))
    c = urlparse.urljoin(b, str(events_from_date))
    d = urlparse.urljoin(c, str(api_version))
    e = urlparse.urljoin(d, str(id))
    url = e + '.json'

Output = http://example.com/schedule/12/20160322/v1/1.json

The above works and I tried to make it shorter this way:

url_join_items = [environment, schedule_uri, str(events_to_hours),
                  str(events_from_date), str(api_version), str(id), ".json"]
new_url = ""
for url_items in url_join_items:
    new_url = urlparse.urljoin(new_url, url_items)

Output: http://example.com/schedule/.json

But the second implementation does not work. Please suggest me how to fix this or the better way of doing it.

EDIT 1: The output from the reduce solution looks like this (unfortunately): Output: http://example.com/schedule/.json

summerNight
  • 1,446
  • 3
  • 25
  • 52
  • @idjaw: In my case I am already sure I want to use `urlparse`, I just need a cleaner and shorter way of joining more than 2 components to the same url – summerNight Mar 23 '16 at 21:57
  • 1
    Why doesn't it work? It looks mostly good to me. What's the output and what's wrong with it? You probably want to stick to string concatenation for the `.json` so that your URL doesn't end in `/.json` though. Also I don't know how it's going to handle an empty string so you may want to start with `new_url = url_join_items[0]` and then iterate over `url_join_items[1:]`. But the answer with reduce is probably better. – Alex Hall Mar 23 '16 at 22:03
  • @AlexHall: Please see my EDIT 1 and Output from each method – summerNight Mar 23 '16 at 22:19
  • OK, again, leave the `.json` out of `url_join_items`. Otherwise, I can't understand why so many parts are missing. What happens if you `print url_join_items`? – Alex Hall Mar 23 '16 at 22:23
  • @AlexHall: I get this: `['http://example.com', 'schedule/', '12', '20160322', 'v1', '100', '.json']` – summerNight Mar 23 '16 at 22:27
  • `urljoin('http://example.com/schedule/12', '20160322')` returns `'http://example.com/schedule/20160322'` which is probably not what you want. I'm not sure why it does that but that's your problem. Perhaps avoid it and simply use `'/'.join(url_join_items)`. You'll want to make sure there's no extra slashes in the items though. A quick way is `item.strip('/')` but if there's more than one slash at the end it'll strip them all which you probably don't want. – Alex Hall Mar 23 '16 at 22:43
  • In my response, I provide a link to a related question that explains what urljoin is doing, and why you are seeing the results with "missing" parts. I also suggest "/".join, which is straightforward and concise. – svohara Mar 24 '16 at 01:07

5 Answers5

22

Using join

Have you tried simply "/".join(url_join_items). Does not http always use the forward slash? You might have to manually setup the prefix "https://" and the suffix, though.

Something like:

url = "https://{}.json".format("/".join(url_join_items))

Using reduce and urljoin

Here is a related question on SO that explains to some degree the thinking behind the implementation of urljoin. Your use case does not appear to be the best fit.

When using reduce and urljoin, I'm not sure it will do what the question intends, which is semantically like os.path.join, but for urls. Consider the following:

from urllib.parse import urljoin
from functools import reduce

parts_1 = ["a","b","c","d"]
parts_2 = ["https://","server.com","somedir","somefile.json"]
parts_3 = ["https://","server.com/","somedir/","somefile.json"]

out1 = reduce(urljoin, parts_1)
print(out1)

d

out2 = reduce(urljoin, parts_2)
print(out2)

https:///somefile.json

out3 = reduce(urljoin, parts_3)
print(out3)

https:///server.com/somedir/somefile.json

Note that with the exception of the extra "/" after the https prefix, the third output is probably closest to what the asker intends, except we've had to do all the work of formatting the parts with the separator.

user1857492
  • 697
  • 1
  • 7
  • 22
svohara
  • 2,159
  • 19
  • 17
5

I also needed something similar and came up with this solution:

from urllib.parse import urljoin, quote_plus

def multi_urljoin(*parts):
    return urljoin(parts[0], "/".join(quote_plus(part.strip("/"), safe="/") for part in parts[1:]))

print(multi_urljoin("https://server.com", "path/to/some/dir/", "2019", "4", "17", "some_random_string", "image.jpg"))

This prints 'https://server.com/path/to/some/dir/2019/4/17/some_random_string/image.jpg'

Klemen Tusar
  • 9,261
  • 4
  • 31
  • 28
1

Here's a bit silly but workable solution, given that parts is a list of URL parts in order

my_url = '/'.join(parts).replace('//', '/').replace(':/', '://')

I wish replace would have a from option but it does not hence the second one is to recover https:// double slash

Nice thing is you don't have to worry about parts already having (or not having) any slashes

Bostone
  • 36,858
  • 39
  • 167
  • 227
1

Simple solution will be:

def url_join(*parts: str) -> str:
    import re

    line = '/'.join(parts)
    line = re.sub('/{2,}', '/', line)
    return re.sub(':/', '://', line)
dzav
  • 545
  • 1
  • 10
  • 25
0

This is what worked for me all the best:

def join_url_parts(base: str, parts: list[str], allow_fragments: bool = True) -> str:
    """Join multiple URL parts together.

    See the examples below. All of them would produce the same result:
    `https://example.com/api/v1/users/`

        print(join_url_parts("https://example.com", ["api", "v1", "users"]))
        print(join_url_parts("https://example.com", ["api", "v1/", "users"]))
        print(join_url_parts("https://example.com/", ["api/", "v1/", "users/"]))
        print(join_url_parts("https://example.com/", ["/api/", "/v1/", "users/"]))
    """
    url = "/".join(map(lambda x: str(x).strip("/"), parts)) + "/"
    return urljoin(base, url, allow_fragments)

This basically replicates the standard urljoin but allows the second arguments to be parts (list of strings).

Artur Barseghyan
  • 12,746
  • 4
  • 52
  • 44