3

I am trying to find links in user entered text and convert them to link automatically.

I am using current Regex as following, which good to find hyperlinks from text.

Regex regexResolveUrl = new Regex("((http://|www\\.)([A-Z0-9.-:]{1,})\\.[0-9A-Z?;~&#=\\-_\\./]{2,})", RegexOptions.Compiled | RegexOptions.IgnoreCase);

It is working good for almost all links so far i came across but it is giving problem when i want to detect links with hypen.

i.e. www.abc-xyz.com will not work, with above regex, can anyone help me with this?

Aaron Butacov
  • 32,415
  • 8
  • 47
  • 61
semmy
  • 61
  • 5

3 Answers3

8

If you want - to mean dash literally in a character class definition, you need to put it as the last (or first) character. So [abc-] is a character class containing 4 characters, a, b, c, -. On the other hand, [ab-c] only contains 3 characters, not including the -, because - is a range definition.

So, something like this (from your pattern):

[A-Z0-9.-:]

Defines 3 ranges, from A to Z, from 0 to 9, and from . (ASCII 46) to : (ASCII 58). You want instead:

[A-Z0-9.:-]

References


Note on repetition

I noticed that you used {1,} in your pattern to denote "one-or-more of".

.NET regex (like most other flavors) support these shorthands:

  • ?: "zero-or-one" {0,1}
  • *: "zero-or-more" {0,}
  • +: "one-or-more" {1,}

They may take some getting used to, but they're also pretty standard.

References

Related questions


Note on C# @-quoted string literals

While doubling the slashes in string literals for regex pattern is the norm in e.g. Java (out of necessity), in C# you actually have an option to use @-quoted string literals.

That is, these pairs of strings are identical:

"(http://|www\\.)"
@"(http://|www\.)"

"c:\\Docs\\Source\\a.txt"
@"c:\Docs\Source\a.txt"

Using @ can lead to more readable regex patterns because a literal slash don't have to be doubled (although on the other hand, a double quote must now in turn be doubled).

References

Community
  • 1
  • 1
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • 1
    The alternative is to escape the dash `\-`, but personally I prefer putting it at the end of the range, too. – Tomalak Jun 13 '10 at 14:38
2

Add the hyphen as the first or last character in the character class.

John Gietzen
  • 48,783
  • 32
  • 145
  • 190
2

Escape the hyphen:

 Regex("((http://|www\\.)([A-Z0-9.\-:]{1,})\\.[0-9A-Z?;~&#=\\-_\\./]{2,})", RegexOptions.Compiled | RegexOptions.IgnoreCase);
Aaron Butacov
  • 32,415
  • 8
  • 47
  • 61
  • Aaron, you answered my question, but just to update it would be \\ while doing so, i.e. Regex("((http://|www\\.)([A-Z0-9.\\-:]{1,})\\.[0-9A-Z?;~=\\-_\\./]{2,})", RegexOptions.Compiled | RegexOptions.IgnoreCase); – semmy Jun 13 '10 at 14:40
  • @Maria: The double backslash is not part of the regex, it's part of C# string mechanics. Don't confuse escaping mechanisms! – Tomalak Jun 13 '10 at 14:51
  • Tomalak is right, sorry about that, I forgot it was C# for a minute. – Aaron Butacov Jun 13 '10 at 14:58
  • You should be using escaped strings: @"..." rather than standard strings. – Charles Jul 09 '10 at 03:42