0

I have a bunch of text and HTML and what I want to achieve is gather all css classes of img tags that match a certain pattern and place them in a loop.

For example:

<img src="sample1.gif' class="get-this-tag-1" />This is text. This is text. This is text. This is text. This is text. This is text. <img src="sample2.gif' class="image" />This is text. This is text. This is text. This is text. This is text. This is text. <img src="sample3.gif' class="get-this-tag-2" />This is text. This is text. This is text. This is text. This is text. This is text.

In the sample we have 3 images with different classes: get-this-tag-1, image and get-this-tag-2. I only want to retrieve the classes that match get-this-tag- and have them in a loop.

foreach ($classes as $class) {

  //do something

}

Is this possible? Or is there a more optimal way of doing what I want to achieve?

Thank you in advance!

deceze
  • 510,633
  • 85
  • 743
  • 889
druesome
  • 155
  • 1
  • 7
  • 2
    See http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Borealid Aug 23 '10 at 02:07
  • 1
    I need a regexp that parses SO questions in HTML for the the words `parse`, `X?HTML` and some unspecified XML tag and adds a comment pointing to bobince's canonical answer. Is this the right place to ask? – msw Aug 23 '10 at 02:13

4 Answers4

0

First of all, you shouldn't process HTML with RegExes. You may well process the class names with RegExes, but not the HTML. That should be done using a proper parser.

Your example would be trivial to do using Javascript and is a one-liner in jQuery. As such, you may want to look into phpquery, which makes it easy for PHP too. In my experience it's a rather slow affair for large sites though.

deceze
  • 510,633
  • 85
  • 743
  • 889
  • Thanks for pointing me in the right direction. I used loadHTML and xpath to parse the HTML and it works great. :) – druesome Aug 24 '10 at 03:36
0

I've done this is regex, but it's a lot less painful with the newer DOM functions. You didn't specify what language you are using, but it appears it's php, which has adequate functions for this job:

http://us.php.net/dom

If you still want to use regex, this might get you started:

$matches = array();
preg_match_all ( '|<img.*>|siU', $data, $matches, PREG_PATTERN_ORDER );
print_r($matches);
Hans
  • 3,403
  • 3
  • 28
  • 33
0

Do you need to know the class names in the server, or on the browser end? If it is just the browser end, I would recommend using jquery to get the data from javascript.

iWantSimpleLife
  • 1,944
  • 14
  • 22
  • Hi, I figured it out the hard way. Used this to get the image classes with jquery: $j("img").each(function() { var cname = $j(this).attr("class") do stuff... }); I think this is the best way to approach it. Thanks for your suggestion :) – druesome Aug 25 '10 at 00:54
  • I think the more efficient way would be to use $j("img.class") or $j("img[class=someclass]") to get all the image references with the class. – iWantSimpleLife Aug 27 '10 at 07:09
0

I'm not sure to well understand what you mean, but if you want to retrieve only the classes name, you can do :

preg_match_all("/get-this-tag-\d+/", $string, $classes);

All the classes name begin whith get-this-tag will be in array $classes.

Toto
  • 89,455
  • 62
  • 89
  • 125