0

In a Delphi 10.4.2 win-32 VCL Application in Windows 10, I try to check whether a string is a valid URL.

Of course, I have examined the answers at: https://stackoverflow.com/search?q=delphi+check+valid+url
and: What is the best regular expression to check if a string is a valid URL?

A few of those regular expressions are so long (e.g. 5500 characters) that they cannot be pasted as a string constant in the Delphi code editor. Others simply don't work in this context (Delphi).

This is what I tried, using TRegEx and ShLwApi:

function TformMain.IsValidURL(const AUrl: string): Boolean;
const
  RE = '/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+(:[0-9]+)?|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)/';
begin
  Result := False;
  if AUrl = '' then EXIT;

  // Does not work: 'https://www.google.c' is detected as valid:
  //Result := TRegEx.IsMatch(AUrl, '\A\b(?:(?:https?|ftps?|file)://|www\.|ftp|com\.)[-A-Z0-9+&@#/%=~_|$?!:,.]*[A-Z0-9+&@#/%=~_|$]\z', [roIgnoreCase]);

  // Does not work: almost everything starting with 'https:' is valid:
  //Result := Boolean(ShLwApi.PathIsURL(PChar(AUrl)));

  // Does not work with 'https://www.google.com':
  //Result := TRegEx.IsMatch(AUrl, RE, [roIgnoreCase]);
end;

The solution should be only string-based (not connecting to the Internet).

I suspect that there may have to be a very simple solution.

user1580348
  • 5,721
  • 4
  • 43
  • 105
  • 2
    "_they cannot be pasted as a string constant in the Delphi code editor_" - you can concatenate several literals into one constant, which then easily holds 5500 characters. It doesn't have to be one long literal/line: `const RE= 'one'+ 'two'+ 'three'...;` – AmigoJack Aug 05 '21 at 12:00
  • `[\+~%\/.\w-_]` in the regex might be treated as invalid range - did it even compile? Change `-` into `\-` to make sure the regex engine understands what you want. – AmigoJack Aug 05 '21 at 12:04
  • 2
    This might not apply to your scenario, but for the benefit of others who might see this StackOverflow question: Please note that a URL might look a bit different than the schoolbook example `http://www.example.com`. For instance, the set of TLDs is increasing: `example.beer`, `example.theatre`, `example.sydney`, and [many others](https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains#ICANN-era_generic_top-level_domains). This list might be expanded in the future, so it is unwise to hardcode a list of allowed TLDs. Also, a URL might not have a TLD: `rejbrandcloud` or `127.0.0.1:80`. – Andreas Rejbrand Aug 05 '21 at 12:20
  • 1
    And this is also a valid URL: `http://admin:1grg34bAA@hörsës/things(1,2)?a=5#µ`. – Andreas Rejbrand Aug 05 '21 at 12:22

1 Answers1

2

Delphi 10.4.2 offers a record TURI in System.Net.URLClient.pas. Calling the constructor Create with your URL will raise an ENetURIException for an invalid URL.

Uwe Raabe
  • 45,288
  • 3
  • 82
  • 130
  • In 10.3, I don't get an exception, but maybe the record has been updated in 10.4? – Andreas Rejbrand Aug 05 '21 at 12:09
  • Using the `TURI` record in Delphi 10.4.2, EVERYTHING starting with `https://` does NOT create an exception, e.g. `https://duck`. And `https://` alone DOES create an exception. – user1580348 Aug 05 '21 at 12:24
  • 1
    The fact that `https://duck` doesn't raise is very good, because this is certainly a valid URL. For instance, if I rename my personal cloud `duck`, then I would type `https://duck` in Firefox to access it. – Andreas Rejbrand Aug 05 '21 at 12:27
  • @user1580348 *"EVERYTHING starting with https:// does NOT create an exception"* If you put a non-ASCII character (e.g. an accented letter), I doubt it will accept it. – Olivier Aug 05 '21 at 12:32
  • @Olivier: It will. Apparently you haven't visited my website https://ändlöslängtan.se/. – Andreas Rejbrand Aug 05 '21 at 12:33
  • @user1580348: I understand that you want to forbid `https://google.c`. One approach would be to parse the string using `TURI` and then extract the TLD from the record's `Host` property and see if it is part of a hard-coded list of known TLDs. But, of course, this will make your app malfunction the next time a new TLD is added (and if the URL doesn't contain a TLD, like a pure IP address or computer name). – Andreas Rejbrand Aug 05 '21 at 12:36
  • @AndreasRejbrand I think domain names with special characters are converted to some ASCII representation (with an `xn--` prefix)? – Olivier Aug 05 '21 at 12:38
  • @Olivier: Yes, they are. And `TURI` does that. – Andreas Rejbrand Aug 05 '21 at 12:38
  • @AndreasRejbrand BTW, you should fix your site URL in your profile because the `www.` prefix makes it invalid... – Olivier Aug 05 '21 at 12:39
  • (Or are *single-character* TLDs syntactically forbidden? I don't know, but if so, just check the TLD's length.) – Andreas Rejbrand Aug 05 '21 at 12:39
  • 1
    I am in a sort of dilemma here: The validation should check whether the URL is working with `Winapi.WinInet.InternetOpen` without connecting to the Internet. That is logically impossible, isn't it? – user1580348 Aug 05 '21 at 12:40
  • @Olivier: Ah, thank you for pointing that out. – Andreas Rejbrand Aug 05 '21 at 12:41
  • @user1580348: Well, you must always take into account that `InternetOpen` may fail in any case. – Andreas Rejbrand Aug 05 '21 at 12:45
  • 1
    I have realized that this kind of URL STRING validation is impossible, like for example a function that checks whether a string is polite or not: `function StringIsPolite(const S: string);` – user1580348 Aug 05 '21 at 12:56
  • 3
    @user1580348: That sounds like a very wise conclusion. (A schoolbook example: Don't write a function that checks if you have write access to a particular directory. Instead, try to write to it and handle any failure that may occur.) – Andreas Rejbrand Aug 05 '21 at 13:00
  • 1
    What I wanted to say: The `Internet` is a reference system (aka `universe`) completely outside of the realm of a programming language: You cannot make statements about things in that other universe without entering it and becoming a part of it. – user1580348 Aug 05 '21 at 13:49