1

I need to find the first img tag's src by using regex to the following string. How to do that ?

><div dir="ltr" style="text-align: left;" trbidi="on"><div class="MsoNormal"
 style="background: white; line-height: 15.0pt; margin-bottom: .0001pt; margin-bottom: 0in; mso-outline-level: 2; vertical-align: baseline;"><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-c-ugY7XUnYo/UoJtj0dzvKI/AAAAAAAAACA/qWtvYnP9wfc/s1600/Screen+shot+2013-11-12+at+10.03.25+AM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="257" src="http://1.bp.blogspot.com/-c-ugY7XUnYo/UoJtj0dzvKI/AAAAAAAAACA/qWtvYnP9wfc/s320/Screen+shot+2013-11-12+at+10.03.25+AM.png" width="320" /></a></div><h4><span style="background-color: transparent;">With over 150,000 pet care professionals in the United States, your ability to differentiate your business is critical to long-term sustainable growth.  By focusing on the customer experience you can gain the loyalty of prospective pet parents and continue to thrive with your current pack.</span><span style="background-color: transparent;">  </span><span style="background-color: transparent;">Below are 5 ways to differentiate your pet business so you have a leg up on your local competitors.</span></h4></div><div class="MsoNormal"><div
Vignesh Kumar A
  • 27,863
  • 13
  • 63
  • 115
Sampath
  • 63,341
  • 64
  • 307
  • 441

2 Answers2

4

Don't use Regex to parse html. Use a real html parser like HtmlAgilityPack

var html = WebUtility.HtmlDecode(yourtext);
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var urls = doc.DocumentNode.SelectNodes("//img[@src]")
              .Select(img => img.Attributes["src"].Value)
              .ToList(); 
carla
  • 1,970
  • 1
  • 31
  • 44
L.B
  • 114,136
  • 19
  • 178
  • 224
  • I agree not to parse html using Regex. But in this simple case Regex can be just handy and practical. – Johnny Feb 12 '14 at 08:14
3

Try this

<img.+?src=[\"'](.+?)[\"'].*?>

string src = Regex.Match(original_text, "<img.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;

Regex Demo

Vignesh Kumar A
  • 27,863
  • 13
  • 63
  • 115
  • You have to escape < and > by backslashs, so instead of "<" you have to put "\\<". – Georg Feb 12 '14 at 08:00
  • Fantastic.It's working.I also would like to learn Regex like you.Could you tell me where should I start from ? Thanks a lot. :) – Sampath Feb 12 '14 at 08:13