2

What's the best way to validate that a string is a valid URL path in Python?

I want to avoid maintaining a complex regex if possible.

I have this regex so far but I am looking for alternatives because it does not cover all cases for just the path segment:

def check_valid_path(v):
    pattern = r"^\/[-a-z\d_\/]*$"
    if not compile(pattern).match(v):
        raise ValueError(f"Invalid path: {v}. Must match {pattern}")
martin8768
  • 565
  • 5
  • 20
  • 1
    https://docs.python.org/3/library/urllib.parse.html#module-urllib.parse? – jonrsharpe Aug 22 '23 at 07:20
  • 1
    @jonrsharpe From the doc: `Warning urlparse() does not perform validation. See URL parsing security for details.` – Gugu72 Aug 22 '23 at 08:20
  • 2
    Not sure there is a satisfying answer, but the question is essentially the same one that was asked here: https://stackoverflow.com/questions/7160737/how-to-validate-a-url-in-python-malformed-or-not – slothrop Aug 22 '23 at 08:26
  • form [pydantic - url validtion regex](https://github.com/pydantic/pydantic/blob/ca2fbdc5463a367b7b61f7ca437cfeae7c6ba524/pydantic/v1/networks.py#L342), check the regex for the url of your requirement – sahasrara62 Aug 22 '23 at 10:16

2 Answers2

0

Pydantic is a popular data validation package and has got may ready-made options for many common scenarios. It has several alternatives for URLs, here's an example with HttpUrl.

from pydantic import BaseModel, HttpUrl, ValidationError

class UrlValidator(BaseModel):
    url: HttpUrl

def validate(url: str):
    try:
        UrlValidator(url=url)
    except ValidationError:
        print("Not a valid url")
    else:
        print("Valid url")
>>> validate("asd")
Not a valid url
>>> validate("https://stackoverflow.com/questions/76950928/validating-a-path-in-python")
Valid url
edd313
  • 1,109
  • 7
  • 20
  • This only works for full URLs, I am validating just the path part – martin8768 Aug 25 '23 at 09:29
  • Mine was just an example, you need to have a look at your specific field type. Have a look at `FileUrl` or `FilePath` or simply use `Path` from `pathlib`. – edd313 Aug 25 '23 at 09:44
-1

You can use the urllib.parse.urlparse function to parse the URL. Then, you can check if the scheme and netloc components are empty and the path component is not empty. This generally indicates a valid URL path.

  • 3
    While this is useful, it does not check if the path is valid: From the doc: `Warning urlparse() does not perform validation. See URL parsing security for details.` – Gugu72 Aug 22 '23 at 08:21