70

I am trying to use python to change the hostname in a url, and have been playing around with the urlparse module for a while now without finding a satisfactory solution. As an example, consider the url:

https://www.google.dk:80/barbaz

I would like to replace "www.google.dk" with e.g. "www.foo.dk", so I get the following url:

https://www.foo.dk:80/barbaz.

So the part I want to replace is what urlparse.urlsplit refers to as hostname. I had hoped that the result of urlsplit would let me make changes, but the resulting type ParseResult doesn't allow me to. If nothing else I can of course reconstruct the new url by appending all the parts together with +, but this would leave me with some quite ugly code with a lot of conditionals to get "://" and ":" in the correct places.

  • I was trying to avoid any if statements, as it may vary whether the base url has a port number or not. Based on your answers though, it does not seem like I can avoid it :-). Thanks for your help. – Rikke Bendlin Gammelmark Feb 10 '14 at 11:42

7 Answers7

118

You can use urllib.parse.urlparse function and ParseResult._replace method (Python 3):

>>> import urllib.parse
>>> parsed = urllib.parse.urlparse("https://www.google.dk:80/barbaz")
>>> replaced = parsed._replace(netloc="www.foo.dk:80")
>>> print(replaced)
ParseResult(scheme='https', netloc='www.foo.dk:80', path='/barbaz', params='', query='', fragment='')

If you're using Python 2, then replace urllib.parse with urlparse.

ParseResult is a subclass of namedtuple and _replace is a namedtuple method that:

returns a new instance of the named tuple replacing specified fields with new values

UPDATE:

As @2rs2ts said in the comment netloc attribute includes a port number.

Good news: ParseResult has hostname and port attributes. Bad news: hostname and port are not the members of namedtuple, they're dynamic properties and you can't do parsed._replace(hostname="www.foo.dk"). It'll throw an exception.

If you don't want to split on : and your url always has a port number and doesn't have username and password (that's urls like "https://username:password@www.google.dk:80/barbaz") you can do:

parsed._replace(netloc="{}:{}".format(parsed.hostname, parsed.port))
Nigel Tufnel
  • 11,146
  • 4
  • 35
  • 31
  • 1
    Note that the hostname is called the `netloc` and it includes any port numbers. This answer shows that but doesn't make it explicit. – 2rs2ts Feb 07 '14 at 13:35
  • 21
    Using a private method `_replace` doesn't feel right. – Flimm Feb 12 '16 at 12:22
  • 62
    `_replace` is a part of `namedtuple` public API. It just starts with the underscore to avoid conflicts with the field names. – Nigel Tufnel Feb 12 '16 at 15:11
  • 2
    A heads up - `netloc` also includes username and password. If you parse something like `'https://user:hunter2@example.com:444/path'` your `netloc` would be `'user:hunter2@example.com:444'`. – Benjamin Manns Aug 21 '18 at 14:42
  • 1
    urlparse is not an importable library in pip and as such, this does not work because "import urlparse" does not work. – b264 Feb 27 '19 at 22:21
  • Looks like you're using Python 3. Good for you! I've updated my answer for Python 3. – Nigel Tufnel Mar 11 '19 at 16:35
29

You can take advantage of urlsplit and urlunsplit from Python's urlparse:

>>> from urlparse import urlsplit, urlunsplit
>>> url = list(urlsplit('https://www.google.dk:80/barbaz'))
>>> url
['https', 'www.google.dk:80', '/barbaz', '', '']
>>> url[1] = 'www.foo.dk:80'
>>> new_url = urlunsplit(url)
>>> new_url
'https://www.foo.dk:80/barbaz'

As the docs state, the argument passed to urlunsplit() "can be any five-item iterable", so the above code works as expected.

linkyndy
  • 17,038
  • 20
  • 114
  • 194
9

Using urlparse and urlunparse methods of urlparse module:

import urlparse

old_url = 'https://www.google.dk:80/barbaz'
url_lst = list(urlparse.urlparse(old_url))
# Now url_lst is ['https', 'www.google.dk:80', '/barbaz', '', '', '']
url_lst[1] = 'www.foo.dk:80'
# Now url_lst is ['https', 'www.foo.dk:80', '/barbaz', '', '', '']
new_url = urlparse.urlunparse(url_lst)

print(old_url)
print(new_url)

Output:

https://www.google.dk:80/barbaz
https://www.foo.dk:80/barbaz
Omid Raha
  • 9,862
  • 1
  • 60
  • 64
6

A simple string replace of the host in the netloc also works in most cases:

>>> p = urlparse.urlparse('https://www.google.dk:80/barbaz')
>>> p._replace(netloc=p.netloc.replace(p.hostname, 'www.foo.dk')).geturl()
'https://www.foo.dk:80/barbaz'

This will not work if, by some chance, the user name or password matches the hostname. You cannot limit str.replace to replace the last occurrence only, so instead we can use split and join:

>>> p = urlparse.urlparse('https://www.google.dk:www.google.dk@www.google.dk:80/barbaz')
>>> new_netloc = 'www.foo.dk'.join(p.netloc.rsplit(p.hostname, 1))
>>> p._replace(netloc=new_netloc).geturl()
'https://www.google.dk:www.google.dk@www.foo.dk:80/barbaz'
Alexandre Hamez
  • 7,725
  • 2
  • 28
  • 39
David Morley
  • 71
  • 1
  • 1
  • _replace is private, should not be used by client code. – gb. Jun 03 '16 at 03:37
  • 1
    Better than accepted answer, especially the second option. – nirvana-msu Dec 22 '17 at 01:36
  • 5
    @gb: _replace is not private in NamedTuple. It's part of the API: https://docs.python.org/2/library/collections.html#collections.namedtuple – kbyrd Feb 01 '18 at 16:48
  • Yup, `_replace` is not private. Quoting the [v3 doc](https://docs.python.org/3/library/collections.html#namedtuple-factory-function-for-tuples-with-named-fields): *To prevent conflicts with field names, the method and attribute names start with an underscore.* Much better option than putzing around with list indices as done in the other answers. – JL Peyret Jun 05 '20 at 23:45
  • Although, `_replace` is only part of the story as it returns a new tuple rather than mutating the old. so `newurl = urlunsplit(urlsplit(url)._replace(netloc=""))`, _replace on `p` as above has no effect – JL Peyret Jun 06 '20 at 00:07
5

I would recommend also using urlsplit and urlunsplit like @linkyndy's answer, but for Python3 it would be:

>>> from urllib.parse import urlsplit, urlunsplit
>>> url = list(urlsplit('https://www.google.dk:80/barbaz'))
>>> url
['https', 'www.google.dk:80', '/barbaz', '', '']
>>> url[1] = 'www.foo.dk:80'
>>> new_url = urlunsplit(url)
>>> new_url
'https://www.foo.dk:80/barbaz'
eLRuLL
  • 18,488
  • 9
  • 73
  • 99
4

You can always do this trick:

>>> p = parse.urlparse("https://stackoverflow.com/questions/21628852/changing-hostname-in-a-url")
>>> parse.ParseResult(**dict(p._asdict(), netloc='perrito.com.ar')).geturl()
'https://perrito.com.ar/questions/21628852/changing-hostname-in-a-url'
3

To just replace the host without touching the port in use (if any), use this:

import re, urlparse

p = list(urlparse.urlsplit('https://www.google.dk:80/barbaz'))
p[1] = re.sub('^[^:]*', 'www.foo.dk', p[1])
print urlparse.urlunsplit(p)

prints

https://www.foo.dk:80/barbaz

If you've not given any port, this works fine as well.

If you prefer the _replace way Nigel pointed out, you can use this instead:

p = urlparse.urlsplit('https://www.google.dk:80/barbaz')
p = p._replace(netloc=re.sub('^[^:]*', 'www.foo.dk', p.netloc))
print urlparse.urlunsplit(p)
Alfe
  • 56,346
  • 20
  • 107
  • 159
  • @Downvoter: Care to mention what you didn't like? A downvote without reason (not obvious) isn't helpful at all. I'd like to improve my answer, if possible. – Alfe Sep 19 '18 at 09:01