0

I have to match a pattern with the URL. I want the pattern to match the domain, and don't care about if it ends in a trailing slash or if it has querystring params, or any subdomains I want only to accept the protocols http or https.

Here is what I tried:

using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;   
using Newtonsoft.Json;
public class Program
{
    public static void Main()
    {
        List<string>  inputs = new List<string>{
            "https://dotnetfiddle.net/UA6bCb"
        ,"http://www.test.ch/de-ch/apps/weve?anlassId=236601"
        ,"https://www.test.ch/de-ch/apps/weve?anlassId=236601"
        ,"http://test.ch/de-ch/apps/weve?anlassId=236601"
        ,"https://test.ch/de-ch/apps/weve?anlassId=236601"
                ,"https://test.chn/de-ch/apps/weve?anlassId=236601"
                ,"https://www.test.chn/de-ch/apps/weve?anlassId=236601"
                ,"https://test.ch/de-ch/"
                ,"https://test.ch/de-ch"
                ,"https://test.ch/"
                ,"https://test.ch"
                ,"https:test.ch"
        };
    
        Test(inputs);
        
    }

    public static void Test(List<string> inputs)
    {
        var regexString=  @"http(s)?://?([\w-]+\.)?test.ch(/[\w- ;,./?%&=]*)?";
        foreach(var input in inputs){
        var matches = Regex.Match(input,regexString, RegexOptions.Compiled | RegexOptions.IgnoreCase);
            
            if(matches.Success){
                Console.WriteLine("{0} matches {1}", input, regexString);
            }
            else{
                    Console.WriteLine("NO MATCH for {0}", input);
            }
        
        
        }
    }
}

This returns

NO MATCH: https://dotnetfiddle.net/UA6bCb
Match: http://www.test.ch/de-ch/apps/weve?anlassId=236601
Match: https://www.test.ch/de-ch/apps/weve?anlassId=236601
Match: http://test.ch/de-ch/apps/weve?anlassId=236601
Match: https://test.ch/de-ch/apps/weve?anlassId=236601
Match: https://test.chn/de-ch/apps/weve?anlassId=236601
Match: https://www.test.chn/de-ch/apps/weve?anlassId=236601
Match: https://test.ch/de-ch/
Match: https://test.ch/de-ch
Match: https://test.ch/
Match: https://test.ch
NO MATCH: https:test.ch

The problem is that this solution matches https://test.chn/de-ch/apps/weve?anlassId=236601 and https://www.test.chn/de-ch/apps/weve?anlassId=236601

This should be false because the domain ends in chn.

I haven't been able to get the right regex.

Thanks for the help.

CodeHacker
  • 2,127
  • 1
  • 22
  • 35
  • Use anchors. See https://regex101.com/r/PrYANW/1 – Wiktor Stribiżew Mar 02 '21 at 14:40
  • var regexString= @"http(s)?://?([\w-]+\.)?test.ch/([\w- ;,./?%&=]*)?"; – azuremycry Mar 02 '21 at 14:45
  • @azuremycry Thank you, but this won't work either, because this causes https://test.ch not to match but as stated I need it to be able to be with or without trailing slash – CodeHacker Mar 02 '21 at 14:48
  • You don't need a Regex. You can pass the entire url to the `UriBuilder` class and then get back the `Host` property. It gives you back the domain name without the trailing slash nor any protocol (`http://`) in front of it. Note it keeps instead the `www.` – Krusty Mar 02 '21 at 14:55
  • @WiktorStribiżew did you even read the question? This is far from a duplicate, I'm not trying to match an entire string – CodeHacker Mar 02 '21 at 14:55
  • You are not trying, you *are* matching entire strings that are URLs. If you need a morespecific answer, please precise the question. And your regex already [matches](https://regex101.com/r/PrYANW/2) URLs with and without trailing slash. – Wiktor Stribiżew Mar 02 '21 at 14:57
  • @Krusty. No. This is just a code to demonstrate the case.I have a configuration file on a system where I can configure some url patterns allowed to access the system. So I'm searching for a regex.. not for c# code – CodeHacker Mar 02 '21 at 14:58
  • Other than adding anchors, you may further enhance the regex to `^https?://([^/]+\.)?test\.ch(/.*)?$`. See https://ideone.com/2gnN6o. No match for `chn` domain, as you need. – Wiktor Stribiżew Mar 02 '21 at 15:21
  • 1
    Answer for your question is: var regexString= @"http(s)?://?([\w-]+\.)?test.ch(/|$)([\w- ;,./?%&=]*)?"; – azuremycry Mar 03 '21 at 07:19
  • @azuremycry Thank you. Sad I can't mark an answer, because my question was closed (false-duplicate) – CodeHacker Mar 03 '21 at 10:29
  • 1
    but just tell me if my solution works for you :)? – azuremycry Mar 03 '21 at 13:42

1 Answers1

0

If you just want to exclude test.chn then you can use a negative lookbehind to ensure ch is not followed by n:

"http(s)?://?([\w-]+\.)?test.ch(?!n)(/[\w- ;,./?%&=]*)?"

I added the part (?!n).

Jahan Afshari
  • 111
  • 1
  • 6