-1

I am trying to get my regex expression to work to no avail:

All I want to do is find the image tags in an html string so I can replace them:

This is what I think should work:

var regex = new Regex(@"<img.*>");

return regex.Replace(content, "<p><i><b>(See Image Online)</b></i></p>");

And it does work partially, but it seems to be stripping out more than just the image tag.

This is an example of what I want to match:

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAM0AAAD
 NCAMAAAAsYgRbAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5c
 cllPAAAABJQTFRF3NSmzMewPxIG//ncJEJsldTou1jHgAAAARBJREFUeNrs2EEK
 gCAQBVDLuv+V20dENbMY831wKz4Y/VHb/5RGQ0NDQ0NDQ0NDQ0NDQ0NDQ
 0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0PzMWtyaGhoaGhoaGhoaGhoaGhoxtb0QGho
 aGhoaGhoaGhoaGhoaMbRLEvv50VTQ9OTQ5OpyZ01GpM2g0bfmDQaL7S+ofFC6x
 v3ZpxJiywakzbvd9r3RWPS9I2+MWk0+kbf0Hih9Y17U0nTHibrDDQ0NDQ0NDQ0
 NDQ0NDQ0NTXbRSL/AK72o6GhoaGhoRlL8951vwsNDQ0NDQ1NDc0WyHtDTEhD
 Q0NDQ0NTS5MdGhoaGhoaGhoaGhoaGhoaGhoaGhoaGposzSHAAErMwwQ2HwRQ
 AAAAAElFTkSuQmCC" alt="beastie.png">
Greg Gum
  • 33,478
  • 39
  • 162
  • 233
  • Regular expressions isn't a good instrument for working with html (the worst instrument, as for me). Why don't you use [XmlDocument](https://learn.microsoft.com/en-us/dotnet/api/system.xml.xmldocument?view=netframework-4.7.2) or [HTML Agility pack](https://html-agility-pack.net/)? – vasily.sib Jan 30 '19 at 03:12
  • 2
    You should never use RegEx (regular expressions) on html which is not regular. – jdweng Jan 30 '19 at 03:34
  • 1
    If you [.Write](https://learn.microsoft.com/en-us/dotnet/api/system.windows.forms.htmldocument.write) the HTML content to the HtmlDocument created by a WebBrowser class (not the Control), you can use the [Images collection](https://learn.microsoft.com/en-us/dotnet/api/system.windows.forms.htmldocument.images) to directly read all the related Links, using the `src` attribute. – Jimi Jan 30 '19 at 04:34

1 Answers1

1

You need either

new Regex(@"<img.*?>");

if supported, or if not,

new Regex(@"<img[^>]*>");

Your problem is that your regular expression is not matching the first ">" it finds but LAST.

Honza
  • 499
  • 4
  • 12