0

Any ideas how I can use a single regular expression to validate a single url and also match urls in a text block?

var x = "http://myurl.com";
var t = "http://myurl.com ref";
var y = "some text that contains a url http://myurl.com some where";

var expression = "\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[A-Z0-9+&@#/%=~_|]";

Regex.IsMatch(x, expression, RegexOptions.IgnoreCase); // returns true;
Regex.IsMatch(t, expression, RegexOptions.IgnoreCase); // returns false;

Regex.Matches(y, expression, RegexOptions.IgnoreCase); // returns http://myurl.com;
Rohan West
  • 9,262
  • 3
  • 37
  • 64
  • possible duplicate of [How to replace plain URLs with links?](http://stackoverflow.com/questions/37684/how-to-replace-plain-urls-with-links) – AeroX Jul 31 '14 at 14:20
  • 1
    I think this is not the same question. The OP is interested in how to use a single regex pattern for two purposes (match an url and find urls inside a text). I don't think the problem here is how a regex pattern should look like to match all real existent urls. But it is a good reference of course. – Robert S. Jul 31 '14 at 14:52

2 Answers2

1

First of all you have to escape correctly. Use "\\b..." instead of "\b...". IsMatch will also be true for partial matches. You can check if the whole input is matching by doing this:

Match match = Regex.Match(x, expression, RegexOptions.IgnoreCase);

if (match.Success && match.Length == x.Length))
    // full match

With this check and the escape fix, your expression will work as it is. You also can write a helper method for it:

private bool FullMatch(string input, string pattern, RegexOptions options)
{
    Match match = Regex.Match(input, pattern, options);

    return match.Success && match.Length == input.Length;
}

Your code will change to this:

var x = "http://myurl.com";
var t = "http://myurl.com ref";
var y = "some text that contains a url http://myurl.com some where";

var expression = "\\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[A-Z0-9+&@#/%=~_|]";

FullMatch(x, expression, RegexOptions.IgnoreCase); // returns true;
FullMatch(t, expression, RegexOptions.IgnoreCase); // returns false;

Regex.Matches(y, expression, RegexOptions.IgnoreCase); // returns http://myurl.com;
Robert S.
  • 1,942
  • 16
  • 22
0

i think the word boundary is getting you; it will not match for non-word characters.

try this:

var expression = @"(^|\s)(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[A-Z0-9+&@#/%=~_|]($|\s)";

this will bind the start of the match to the beginning of the string or space, and the end of the match to the end of the string or space.


more info: http://www.regular-expressions.info/wordboundaries.html

There are three different positions that qualify as word boundaries:

Before the first character in the string, if the first character is a word character. After the last character in the string, if the last character is a word character. Between two characters in the string, where one is a word character and the other is not a word character. Simply put: \b allows you to perform a "whole words only" search using a regular expression in the form of \bword\b. A "word character" is a character that can be used to form words. All characters that are not "word characters" are "non-word characters".

Brad
  • 15,361
  • 6
  • 36
  • 57
  • The `\s` isn't escaped correctly here. But as good as your regex pattern is, `Regex.IsMatch` will return `true` for the second input as it finds a partial match. The problem isn't the regex pattern but `Regex.IsMatch` as it allows partial matches. – Robert S. Jul 31 '14 at 14:46
  • edited to fix the `\s` issue. why isn't true valid for the second input? "...and also match urls in a text block". looks like a "text block" to me :-? – Brad Jul 31 '14 at 16:31
  • The OP wants a regex pattern to match a string to an url. The second input `"http://myurl.com ref"` isn't a valid url, so a full match aproach should return false while a in-text-seach should return the url within. Therefore the `Regex.Matches` method should be used for checking in-text-search and a full-match-method should be used to check if a single string is an url. And `Regex.IsMatch` won't do that as it will also be `true` if the input string CONTAINS a valid url instead of IS an url. – Robert S. Jul 31 '14 at 22:22