0

I found a lot of questions for this, but I found no practical answer.

I don't need http or www, users should be allowed to enter e.g. example.com

Allowed:

Not allowed:

  • example
  • 1
  • :::::
  • .....

problem with other questions:

So I think I need regular expression for this like in the last example above but a little bit extended. https://docs.python.org/3.6/library/re.html

With this code all examples are working but not "example.de/more"

def verify_url(self, url):
    url = url.strip()
    if url[-1] == ".":
        url = url[:-1]
    if url[-1] == "/":
        url = url[:-1]
    url = url.replace("https://", "") 
    url = url.replace("http://", "") 
    if url.startswith("www."):
        url = url.replace("www.", "")
    result = re.match(
        "^([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}$",
        url)
    if result:
        return url
oxidworks
  • 1,563
  • 1
  • 14
  • 37

1 Answers1

0

It seems that you haven't tried it all yet! look at below example:

import re
match_cases = ['http://www.example.de', 'https://example.de/more', 'www.sub.example.de', 'example.de']

URL_REGEX = r"""(?i)\b((?:https?:(?:/{1,3}|[a-z0-9%])|[a-z0-9.\-]+[.](?:de|com)/)(?:[^\s()<>{}\[\]]+|\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\))+(?:\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’])|(?:(?<!@)[a-z0-9]+(?:[.\-][a-z0-9]+)*[.](?:de|com)\b/?(?!@)))"""


for text in match_cases:
    url = re.findall(URL_REGEX, text)
    print(url)

refer below links for more detail:

Liberal Regex Pattern for Web URLs and few similar answers: https://stackoverflow.com/a/44645567/7664524

https://stackoverflow.com/a/44645124/7664524


Update

In your updated question you have used url.replace("https://", ""); this will replace every https:// from url that means some of the url which contains refer link to other url will be manipulated too.

Gahan
  • 4,075
  • 4
  • 24
  • 44
  • Thank you, almost perfect, but this is only accepting de or com, how to remove this limitation completely? or allow only 2-5 chars for example. – oxidworks Feb 15 '18 at 19:43
  • r"""(?i)\b((?:https?:(?:/{1,3}|[a-z0-9%])|[a-z0-9.\-]+[.](?:[a-z0-9]{2,5}|.)/)(?:[^\s()<>{}\[\]]+|\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\))+(?:\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’])|(?:(?<!@)[a-z0-9]+(?:[.\-][a-z0-9]+)*[.](?:[a-z0-9]{2,5})\b/?(?!@)))""" – oxidworks Feb 15 '18 at 20:16
  • look at the regex part: `(?:de|com)/)` here you can include all possible postfix for domain you want to include, however if you try it with \s it will accept any string which will result to accept any domain like .aaa .aas, literally anything; the link I provided you contains all the list of domain(postfix) you might want to include in your code. – Gahan Feb 16 '18 at 05:17