0

How can I extract domain suffix without entering http:// or https://? For example, if I enter stackoverflow.com, I want to get the result of com.

I have this function, but I must enter http:// to get the result.

Is there any way to skip entering http:// and https://?

procedure TForm1.Button2Click(Sender: TObject);
  function RatChar(S:String; C: Char):Integer;
  var
    i : Integer;
  begin
    i := Length(S);
    //while (S[i] <> C) and (i > 0) do
    while (i > 0) and (S[i] <> C) do
      Dec(i);
    Result := i;
  end;
var
  uri: TIdURI;
  i: Integer;
begin
  uri := TidURI.Create(Edit2.Text);
  try
    //Memo1.Lines.Add(uri.Protocol);
    //Memo1.Lines.Add(uri.Host);
    i := RatChar(uri.Host, '.');
    Memo1.Lines.Add(Copy(uri.Host, i+1, Length(uri.Host)));
    Memo1.Lines.Add(uri.Document);
  finally
    uri.Free;
  end;
end;
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
MrSiMo
  • 101
  • 7
  • 1
    `TIdURI` requires `://` to be present in order to parse out the `Host`, otherwise it assumes the entered string is a file path instead. If your strings are not URIs, then you will have to parse them manually. In any case, why do you need the domain suffix? What are you going to do with it? You do realize that many suffixes (mainly for country domains) have multiple levels to them, don't you? ie `.com` vs `.co.uk`, so your code would have failed for those kind of domains anyway. How are you planning on handling that, even if you could get the `Host`? – Remy Lebeau Feb 18 '22 at 00:53
  • What are you going to do with it? : I want to add this to indy whois client because indy whois support only one whois server, not like ics whois client support all whois servers and do an autoquery, the problem of ics whois client does not support mobiles but indy do. What are you going to do with it? I want to create a mobile app to monitor domains name and send me notifications when the expiration date is approaching your are right .co.uk does not works :( – MrSiMo Feb 18 '22 at 20:45
  • this component does it : http://svn.overbyte.be:8443/svn/ics/trunk/Source/OverbyteIcsWhoisCli.pas the only problem it does not support mobile – MrSiMo Feb 18 '22 at 22:45
  • Do you really need just the suffix, though? I would expect a WHOIS to find an expiration to require a whole domain/subdomain, regardless of how many segments it has. WHOIS of a suffix doesn't really make sense. So it is still not clear to me what you are really trying to achieve. You say you want to monitor domains, so why can't you just monitor domains as-is? Why do you need to parse them at all? It would really help if you providied examples that you are having trouble with. – Remy Lebeau Feb 19 '22 at 01:18
  • I need to get the suffix to set it proper whois server example: 'com whois.verisign-grs.com', 'org whois.pir.org', 'net whois.verisign-grs.com', 'uk whois.nic.uk', 'ac whois.nic.ac', this function works fine but i don't know how to copy the result in edit1.text now i have to set edit2.text = domain1 and edit3.text := domain2; how can i set both in edit2.text : – MrSiMo Feb 19 '22 at 15:42
  • const IcsSPACE = #32; procedure TForm1.Button1Click(Sender: TObject); var I, J, K: integer ; domain1, domain2: string ; begin for J := Length (edit1.text) downto 2 do begin if edit1.text [J] = '.' then begin // search host.co.uk if domain1 = '' then domain1 := Copy (edit1.text, J + 1, 99) + IcsSpace // found uk else begin domain2 := Copy (edit1.text, J + 1, 99) + IcsSpace ; Break ; end; end; end; end; – MrSiMo Feb 19 '22 at 15:47

2 Answers2

1

According to suggestion Extracting top-level and second-level domain from a URL using regex it should run like this in Delphi:

program Project1;
{$APPTYPE CONSOLE}
{$R *.res}
uses
  System.SysUtils, System.RegularExpressions;
var
  url,
  rePattern: string;
  rMatch   : TMatch;
  rGroup   : TGroup;
  arr      : TArray<string>;
begin
  try
    url := 'https://www.answers.com/article/1194427/8-habits-of-extraordinarily likeable-people';
    //url := 'https://stackoverflow.com/questions/71166883/how-to-extract-domain-suffix';
    rePattern := '^(?:https?:\/\/)(?:w{3}\.)?.*?([^.\r\n\/]+\.)([^.\r\n\/]+\.[^.\r\n\/]{2,6}(?:\.[^.\r\n\/]{2,6})?).*$';
    rMatch := TRegEx.Match(url, rePattern);
    if rMatch.Success then
    begin
      rGroup := rMatch.Groups.Item[pred(rMatch.Groups.Count)];
      arr := rGroup.Value.Split(['.']);
      writeln('Top-Level-Domain: ', arr[High(arr)]);
    end
    else
     writeln('Sorry');
    readln;
  except
   on E: Exception do
     Writeln(E.ClassName, ': ', E.Message);
  end;
end.

However, this regular expression only works when www. is supplied.

USauter
  • 295
  • 1
  • 9
  • 1
    this mask gives the suffix without http or www. rePattern := '^(?:w{3}\.)?.*?([^.\r\n\/]+\.)([^.\r\n\/]+\.[^.\r\n\/]{2,6}(?:\.[^.\r\n\/]{2,6})?).*$'; – MrSiMo Feb 18 '22 at 23:20
  • Code and regex will fail on `https://www.bbc.co.uk` ([second-level domain](https://en.wikipedia.org/wiki/Second-level_domain)) and [TLDs exceeding 6 characters (like `.support` and `.academy`)](https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains). – AmigoJack Jun 15 '23 at 01:16
1
uses
  System.SysUtils;
var
  u  : string;
  arr: TArray<string>;
begin
  try
   u   := 'https://stackoverflow.com/questions/71166883/how-to-extract-domain-suffix';
   arr := u.Split(['://'], TStringSplitOptions.ExcludeEmpty);
   u   := arr[High(arr)]; //stackoverflow.com/questions/71166883/how-to-extract-domain-suffix';
   arr := u.Split(['/'], TStringSplitOptions.ExcludeEmpty);
   u   := arr[0]; //stackoverflow.com
   arr := u.Split(['.'], TStringSplitOptions.ExcludeEmpty);
   u   := arr[High(arr)]; //com
   writeln('Top-Level-Domain: ', u);
   readln;
 except
   on E: Exception do
     Writeln(E.ClassName, ': ', E.Message);
 end;
USauter
  • 295
  • 1
  • 9
  • thank you it works fine for top-level domain but does not works for domains but does not works for domains example : domain.co.uk, is there any way to fix this issue? – MrSiMo Feb 18 '22 at 19:14
  • Possible workaround: `if lenght(arr[High(arr)-1]) <= 2 then u := arr[High(arr)-1] + '.' + arr[High(arr)]; ` – USauter Feb 19 '22 at 09:07
  • this show full domain name with it extension – MrSiMo Feb 19 '22 at 15:40