2

I need a regular expression that will return an image tag which doesn't have an alt attribute which has anything between the quotes. For instance, I would like it to return an img tag which has alt="" or which has no alt, but not one which has alt="y".

The image tags might have line breaks in them, and there could be more than one image tag per line.

Currently, what I have is:

<img.@(~[\r\n]|[\r\n])*.@(~(alt=".#"))*.@(~[\r\n]|[\r\n])*.@/>

and I'm testing it on this:

<img alt="" />
<img src="xyz.jpg"
alt="y" />
<img xxxx ABC /> 
<img xxxxxx ABC />
<img src="xyz.jpg" alt="y" />

But my regex returns each image tag, including the 2nd and 5th ones which I don't want to have returned.

I'm working in Microsoft Expression Web.

MNRSullivan
  • 587
  • 7
  • 18
  • 4
    Regular Expressions are not Parsers. They are ill suited to dealing with HTML. – g.d.d.c Apr 17 '12 at 19:17
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Joe Apr 17 '12 at 19:38
  • this is one of the scenarios where the people regurgitating the you can't do this with regex are right. regular expressions are unable to deal with nested tags, so things like `` are unparsable with regex. Regex will work ONLY if you can guarantee that img tags will never contain other img tags. – Sam I am says Reinstate Monica Apr 17 '12 at 22:10
  • None of the img tags I'm dealing with will have nested img tags within them, but I still can't think of any way to do this with regex. – MNRSullivan Apr 19 '12 at 16:11

2 Answers2

2

You might want to take a look at XPath instead to do this. If you are looking for elements with empty alt attributes you can load the document using XmlDocument in .NET and then call SelectNodes("//img[@alt='']") to select nodes.

David Z.
  • 5,621
  • 2
  • 20
  • 13
  • Nice David. But is there some tool that would allow to use jquery/css selectors instead of XPath? – Tomas Apr 17 '12 at 19:38
  • Larry's response below looks pretty great. Maybe that will help. In terms of the right solution, I think it'll depend on what op's needs are. – David Z. Apr 17 '12 at 19:48
  • Well David, I thought the OP will want the server-side solution, as your one is, and I'm curious whether the selector solution can be also server-side... – Tomas Apr 17 '12 at 20:29
  • If you're looking for a jQuery selector-like solution, a cursory look turned up a project called Fizzler that sounds promising. http://code.google.com/p/fizzler/. Obvious downside is the need to include additional .NET libraries to leverage it. – David Z. Apr 17 '12 at 20:49
2

Your best bet would be to use jQuery to parse the string to an html nodes then filter them from there using a selector.

var str = '<img alt="" /><img src="xyz.jpg" alt="y" /><img xxxx ABC /> <img xxxxxx ABC /><img src="xyz.jpg" alt="y" />';
var elementsWithoutAlt = $( str ).filter( 'not([alt])' );
console.log(elementsWithoutAlt.length);

'not([alt])' will find all the elements without an alt attribute. 'img:not([alt])' will find all the 'image' elements without an alt attribute.

Demo: (Click render to see it in action) http://jsbin.com/imeyam/3/edit

jQuery Info http://api.jquery.com/has-attribute-selector/

Larry Battle
  • 9,008
  • 4
  • 41
  • 55
  • Thanks Larry, and David, for the suggestions. – MNRSullivan Apr 17 '12 at 19:59
  • I'm currently trying to implement this jQuery solution. I would like to be able to it read through a page and output the source code from each img tag without an alt attribute. How could I do that? – MNRSullivan Apr 19 '12 at 18:47
  • You should experiment with jQuery so you can understand it better. What part are you having trouble with? $( str ) returns a collection of DOM elements and filter( 'not([alt])' ) filters returns the elements that don't have the alt attribute. – Larry Battle Apr 20 '12 at 03:09
  • I'm looking for some way to display the information in a manner which makes it easily readable so someone could then go through the html and edit the img tags which are returned. I haven't found a great way to do this yet with jQuery. – MNRSullivan Apr 21 '12 at 05:04
  • Why not display the html for elementsWithoutAlt in a textarea tag. – Larry Battle Apr 23 '12 at 21:52
  • Well, I've almost got it. http://jsbin.com/arofix/edit#source I'm not sure what the problem is, the textarea just says [object Object] – MNRSullivan Apr 24 '12 at 18:26
  • I've also tried $(document.body).append( $('').text( imagesWithoutAlt ) ); but that didn't work. Replacing .text with .val just output a textarea with the [object Object] value again – MNRSullivan Apr 24 '12 at 19:45