3

I am new to regular expressions.I am trying to find the Images doesn't having BORDER. So the result should second Image.The text which is trying to match using regex is below.

<IMG onerror="this.errored=true;" USEMAP="#Map-43" BORDER="0"/>
<IMG onerror="this.errored=true;" USEMAP="#Map-43" />
<IMG onerror="this.errored=true;" USEMAP="#Map-43" BORDER="0"/>    

I tried the following regex but didn't worked

<IMG\\s[^((>)&(?!BORDER)]*>

So can any one help on this please?

stema
  • 90,351
  • 20
  • 107
  • 135
  • 1
    str.indexOf( "BORDER" ) will be -1 , for the second string. You can use that. Do you have all the three statements in the same variable?? – madhairsilence Oct 05 '12 at 07:10

3 Answers3

4

You can use HtmlAgilityPack to parse html

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

var imgs = doc.DocumentNode.Descendants("img")
    .Where(n => n.Attributes["border"] == null)
    .ToList();

PS: See also this: RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
L.B
  • 114,136
  • 19
  • 178
  • 224
2

The better choice would be to use an html parser for such a problem.

But your main regex problem here is that you put your lookahead into a character class, that way all character where treated as literal characters.

<IMG\s(?:(?!BORDER)[^>])*>

should work better. See it on Regexr.

But thats only to explain your regex problem. To solve your programming task please use L.B answer.

Working example:

String html = "<IMG onerror=\"this.errored=true;\" USEMAP=\"#Map-43\" BORDER=\"0\"/><IMG onerror=\"this.errored=true;\" USEMAP=\"#Map-43\" /><IMG onerror=\"this.errored=true;\" USEMAP=\"#Map-43\" BORDER=\"0\"/>";
Console.WriteLine(Regex.Matches(html, @"<IMG\s(?:(?!BORDER)[^>])*>").Cast<Match>().ToList()[0]);
Console.ReadLine();
Community
  • 1
  • 1
stema
  • 90,351
  • 20
  • 107
  • 135
  • @L.B, What do you mean? Have you seen the Regexr example? – stema Oct 05 '12 at 07:35
  • Yes ans it says `0 capturing groups:` – L.B Oct 05 '12 at 07:36
  • Yes of course, there is no capturing group in my regex. The whole regex matches the complete 2 row/tag. – stema Oct 05 '12 at 07:40
  • `Regex.Matches(html, @"])*>").Cast().ToList();` also gives 0 matches. – L.B Oct 05 '12 at 07:41
  • I added working code, using quite exactly your line of code, I don't understand your problem. – stema Oct 05 '12 at 08:02
  • Hi Stema,I think your soln is working. BTW my problem isn't resolved yet. – Prasad thankappan Oct 05 '12 at 10:00
  • @Prasadthankappan, if my solution is working, but does not resolve your problem, what is your problem then? – stema Oct 05 '12 at 10:07
  • Hi Stema,I think your soln is working. BTW my problem isn't resolved yet. I have to check one more condition. ie, the image may contain ALT="" property also. for eg . So the requirement is to find the Images having ALT or doesn't have both (ALT and BORDER). many thanks in advance – Prasad thankappan Oct 05 '12 at 10:10
  • @Prasadthankappan, I think you better go with the answer from L.B this isn't a task for regex. – stema Oct 05 '12 at 12:13
0

Another way is to get the "no border attribute" images client-side with the jQuery and CSS selectors:

$img = $('img').not('[border]');

Links:

Alberto De Caro
  • 5,147
  • 9
  • 47
  • 73