1

I need a regex to find a valid URl in a email body, but only if exists a '?' in the match.

ex:

  • www.example.com.br/area?key=235fksf&rec=fsjgsg (OK)
  • www.example.com.br/area?key=235fksf (OK)
  • www.example.com.br/area (Not OK)

Thanks

Edit: I discovered some emails that are dividing the URL with a new line "\r\n".

PS: I'm using the site https://regex101.com/ to make some tests, but it's not working like I discribed.

Edit:

Final Solution

To resolve the possible newline, I used the Hector's answer, and just modified to do what I needed.

var matches = Regex.Matches(body, @"(http(s)?:\/\/)?([\w-]+\.)+[\w-]+(\/[\w-;,.\/?%&=]*[\r\n]*[\w-;,.\/?%&=]*)?");
var url = string.Empty;

foreach(Match match in matches)
{
    if(match.Value.Contains('?'))
    {
        var matchSplit = match.Value.Split(Environment.NewLine.ToCharArray());

        foreach(var matchUnit in matchSplit)
        {
            //Is a valid piece?
            if (matchUnit.Any(x => @"/?&=".Contains(x)))
                url += matchUnit;
        }
        break;
    }
}

1 Answers1

1

Components of a URI

foo://example.com:8042/over/there?name=ferret#nose
  \_/   \______________/\_________/\__________/ \__/
   |           |             |           |        |
scheme     authority       path        query   fragment

Scheme

The scheme of a URL is the first item, such as http, which indicates that this URI uses the hyper-text transport protocol. Examples of other schemes are:

enter image description here

Authority

In a URL the authority is also called the domain and may include a port number at the end separated by a colon.

In the following example, the authority is www.cambiaresearch.com *

http://www.cambiaresearch.com

In the following example, the authority is www.cambiaresearch.com:81

https://www.cambiaresearch.com:81

In the following example, the authority is info@cambiaresearch.com

mailto:info@cambiaresearch.com

Path

The path component of the URL specifies the specific file (or page) at a particular domain. The path is terminated by the end of the URL, a question mark (?) which signifies the beginning of the query string or the number sign (#) which signifies the beginning of the fragment.

The path of the following URL is "/default.htm"

http://www.cambiaresearch.com/default.htm

The path of the following URL is "/snippets/csharp/regex/uri_regex.aspx"

http://www.cambiaresearch.com/snippets/csharp/regex/uri_regex.aspx

Query

The query part of the URL is a way to send some information to the path or webpage that will handle the web request. The query begins with a question mark (?) and is terminated by the end of the URL or a number sign (#) which signifies the beginning of the fragment.

The query of the following URL is "?id=241"

http://www.cambiaresearch.com/default.htm?id=241

The query of the following URL is "?sourceid=navclient&ie=UTF-8&rls=GGLC,GGLC: 1969-53,GGLC:en&q=uri+query"

http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=GGLC,GGLC:1969-53,GGLC:en&q=uri+query

Fragment

In a URL the fragment is used to specify a location within the current page. This is often used in a FAQ with a list of links at the top of the page linking to longer descriptions farther down in the page.

The fragment of the following URL is "contact"

http://www.cambiaresearch.com/default.htm#contact

The fragment of the following URL is "scheme"

http://www.cambiaresearch.com/snippets/csharp/regex/uri_regex.aspx#scheme


Example: Regular Expressions for Parsing URIs and URLs

Simple way using [?] regex pattern:

public bool RegexUrlWithQuestionChar(string url)
{
    string pattern = @"(http(s)?://)?([\w-]+\.)+[\w-]+(/[\w- ;,./?%&=]*)?"; //Url pattern

     var regex = new Regex(pattern);
     var math = regex.Match(url);

     return new Regex("[?]").IsMatch(math.Value); //Find ?
}

if(RegexUrlWithQuestionChar("www.example.com.br/area?key=235fksf&rec=fsjgsg"))
{
    MessageBox.Show("Found"); // This show
}
else
{
   MessageBox.Show("Not found");
}

if(RegexUrlWithQuestionChar("www.example.com.br/area"))
{
    MessageBox.Show("Found");
}
else
{
   MessageBox.Show("Not found"); // This show
}

Credits:

urlregex.com

parsing-urls-with-regular-expressions-and-the-regex-object

www.dotnetperls.com/regex

Héctor M.
  • 2,302
  • 4
  • 17
  • 35