What could be the easiest way to match all links and e-mail addresses in a string to a list array? I was using preg_match
in PHP but in C# it looks like it will be way different.
Asked
Active
Viewed 617 times
0

Deniz Dogan
- 25,711
- 35
- 110
- 162

Semas
- 869
- 10
- 22
-
Are you asking for a regex or are you asking how to use it in C#? – SLaks Jun 09 '10 at 14:09
-
Duplicate: http://stackoverflow.com/questions/591859/a-regex-that-validates-a-web-address-and-matches-an-empty-string – serhio Jun 09 '10 at 14:11
-
By "link" you mean http[s] only addresses or does it include mailto:, javascript:, and so on? – Humberto Jun 09 '10 at 14:11
2 Answers
1
Assuming that you already have a working regular expression, you can use the Regex
class, like this:
static readonly Regex linkFinder = new Regex(@"https?://[a-z0-9.]+/\S+|\s+@\S+\.\S+", RegexOptions.IgnoreCase);
foreach(Match match in linkFinder.Matches(someString)) {
//Do things...
string url = match.Value;
int position = match.Index;
}

SLaks
- 868,454
- 176
- 1,908
- 1,964
-
@serhio: `\S+` should match all that. I'm primarily trying to demonstrate how to use the regex. – SLaks Jun 09 '10 at 14:16
-1
This should work for links:
https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?
This should work for email addresses:
[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}

npinti
- 51,780
- 5
- 72
- 96
-
-1: There are top level domains that "email regex" will fail to match (e.g. .museum TLD). And the domain should be lower case, so in fact it won't match any. Regex is the WRONG TOOL to find email addresses. – Richard Jun 09 '10 at 14:11
-
1@Richard: Regexs are not the "wrong tool" to find emails. They are **exactly the right tool**. They are **wrong** tool to **parse** and **validate**, but finding strings is THE purpose of a regex. – John Gietzen Jun 09 '10 at 14:16
-
@John: for any short regex there will be valid email addresses it fails to find. (E.g. with the one in the Q, many O'Reillys will be disappointed.) – Richard Jun 10 '10 at 10:53