7

I have an URL like this:

https://user:password@example.com/path?key=value#hash

The result should be:

https://user:???@example.com/path?key=value#hash

I could use a regex, but instead I would like to parse the URL a high level data structure, then operate on this data structure, then serializing to a string.

Is this possible with Python?

Asclepius
  • 57,944
  • 17
  • 167
  • 143
guettli
  • 25,042
  • 81
  • 346
  • 663

3 Answers3

18

You can use the built in urlparse to query out the password from a url. It is available in both Python 2 and 3, but under different locations.

Python 2 import urlparse

Python 3 from urllib.parse import urlparse

Example

from urllib.parse import urlparse

parsed = urlparse("https://user:password@example.com/path?key=value#hash")
parsed.password # 'password'

replaced = parsed._replace(netloc="{}:{}@{}".format(parsed.username, "???", parsed.hostname))
replaced.geturl() # 'https://user:???@example.com/path?key=value#hash'

See also this question: Changing hostname in a url

alxwrd
  • 2,320
  • 16
  • 28
  • 1
    Not a very good answer, this will return something like `https://None:???@example.com/` if there was no username and password in the first place. – Patrick Feb 03 '21 at 15:33
  • 3
    @Patrick please feel free to leave your own answer, or submit an edit request for this answer, if you think some more information should be provided. – alxwrd Feb 04 '21 at 10:54
  • 3
    @Patrick like most answers here, this shows how to do the thing that was asked. It can be worked into comprehensive code by the reader. A trivial if statement can check to see if `parsed.password` and `parsed.username` exists and adjust behaviour accordingly. – Philip Couling Mar 26 '21 at 16:16
1
from urllib.parse import urlparse

def redact_url(url: str) -> str:
    url_components = urlparse(url)
    if url_components.username or url_components.password:
        url_components = url_components._replace(
            netloc=f"{url_components.username}:???@{url_components.hostname}",
        )

    return url_components.geturl()
scottsome
  • 432
  • 1
  • 4
  • 5
0

The pip module already have an internal utility function which does exactly this.

>>> from pip._internal.utils.misc import redact_auth_from_url
>>> 
>>> redact_auth_from_url("https://user:password@example.com/path?key=value#hash")
'https://user:****@example.com/path?key=value#hash'
>>> redact_auth_from_url.__doc__
'Replace the password in a given url with ****.'

This will provide the expected result even if the url does not contain username or password.

>>> redact_auth_from_url("https://example.com/path?key=value#hash") 
'https://example.com/path?key=value#hash'
Abdul Niyas P M
  • 18,035
  • 2
  • 25
  • 46