0

How to change absolute url within a paragraph:

<p>http://www.google.com</p>

into html link into paragraph:

<p><a href=\"http://www.google.com\">http://www.google.com</a></p>

Thare can be a lot of paragraphs. I want the regex to cut out the generic url value from this: <p>url<p>, and put it into template like this: <p><a href=\"url\">url</a></p>

How to do it in the short way ? Can it be done using regex.Replace() method ?

BTW: Regular expression used for absolute urls matching can be like this: ^(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&amp;%\$#_]*)?$ (taken from msdn)

jwaliszko
  • 16,942
  • 22
  • 92
  • 158
  • Avoid using a regex on HTML. Have a look at this question for alternatives: http://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c – dan1111 Sep 21 '12 at 14:57
  • Thanks for the clue. But this will be actually used aganist not too long html code so regex usage this should be here an acceptable solution. – jwaliszko Sep 21 '12 at 14:59

2 Answers2

0

form your regex: remove first ^ and last $ - it means "match the whole input string from start to end"

string regexPattern = @"(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&amp;%\$#_]*)?";

string input = @"<p>http://www.google.com</p>";

var reg = new Regex(regexPattern, RegexOptions.IgnoreCase);

// $0 - substitution, refers to  the text matched by the whole pattern    
var output = reg.Replace(input, "<a href=\"$0\">$0</a>");

more about substitutions http://msdn.microsoft.com/en-us/library/ewy2t5e0.aspx

Eldar
  • 862
  • 9
  • 22
0

Try to use this regex:

(?<!\")(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&amp;%\$#_]*)?(?!\")

to avoid matching <a href="http://www.google.com"> like strings(enclosed by").

And a sample code:

var inputString = @"<p>http://www.google.com</p><p><a href=\"http://www.google.com\">my web link</a></p>";
var pattern = @"(?<url>(?<!\")(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&amp;%\$#_]*)?(?!\"))";
var result = Regex.Replace(strInput, pattern, "<a href=\"${url}\">${url}</a>");

explain:

(?<!subexpression) Zero-width negative lookbehind assertion.

(?!subexpression) Zero-width negative lookahead assertion.

(?<name>subexpression) Captures the matched subexpression into a named group.

Ria
  • 10,237
  • 3
  • 33
  • 60