As Lei Yang's answer might be correct, it will fail if src=SRC_VALUE
comes right after <image..
like this: <img src="/captcha?58428805" ...SOME_OTHER ATTR..>
This regex might help:
string toTest = @"<img style=""margin: 0;height:40px;width:115px;"" width=""115"" height=""40"" id=""captcha"" class=""captcha"" src=""/captcha?58428805"" alt="" Verification code with letters and numbers ""/>";
var regex = new Regex(@"<img.{0,}src=""(.+?)""");
Console.WriteLine(regex.Match(toTest).Groups[1].Value);
Explanation for <img.{0,}src="(.+?)"
(note that quotes are escaped in the above code):
<img
- string should contain <img
.{0,}
- matches between zero to infinite occurences of any character except line terminators after the <img
src="
- matches the src="
part after <img
(.+?)"
- .
means any character except line terminators, (+
) occuring 1 or unlimited times, (?
) lazy, and should end in "
.
This regex however will only return the last src
value even if your toTest
string contains multiple <img>
tags. So, you need to Split
your string per <img>
tag then apply the regex above:
string toTest = @"<img style=""margin: 0;height:40px;width:115px;"" width=""115"" height=""40"" id=""captcha"" class=""captcha"" src=""/captcha?58428805"" alt="" Verification code with letters and numbers ""/><img style=""margin: 0;height:40px;width:115px;"" width=""115"" height=""40"" id=""captcha"" class=""captcha"" src=""/captssscha?5842sss8805"" alt="" Verification code with letters and numbers ""/>";
var imgArr = Regex.Split(toTest, @"(<img[\s\S]+?\/>)").Where(l => l != string.Empty).ToArray(); //split the html string by <image> tag
var srcRegex = new Regex(@"<img.{0,}src=""(.+?)""",RegexOptions.Compiled | RegexOptions.Singleline);
foreach(string imgTag in imgArr) {
Console.WriteLine(srcRegex.Match(imgTag).Groups[1].Value);
}