1

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags

I would like tu pull some data from an external website. The html string looks like this (without spaces/lines breaks between the img tags):

<img class="car-type231" src="/2f36b523259e9871bfade01983c9cc91.png" title="toyota"/>
<img class="car-type211" src="/0abc9b3ae3ba4bbcb6d3593fad6c1450.png" title="nissan"/>
<img class="car-type311" src="/4528e30bb510b4289121b4c70cb48ea3.png" title="bmw"/>
<img class="car-type332" src="/64575fee55553623896c7fd587a33ac3.png" title="mercedes"/>
<img class="car-type544" src="/a4f32dd95976d76704795c471c9a08b8.png" title="audi"/>
etc...

I want to pull every src path and create an array that would look like this:

$matches[0] = '/2f36b523259e9871bfade01983c9cc91.png';
$matches[1] = '/0abc9b3ae3ba4bbcb6d3593fad6c1450.png';
etc...

I tried using preg_match with this parameter: '#src="(.*?)"#' but it doesn't worked because it's returning all the html.

Any help would be appreciated!

Community
  • 1
  • 1
mat
  • 2,412
  • 5
  • 31
  • 69

2 Answers2

4

The pony he comes...

Use a parser such as DOMDocument:

$dom = new DOMDocument();
$dom->loadHTML($html);
$imgs = $dom->getElementsByTagName('img');
$l = $imgs->length;
$srcs = []; // Array() in earlier versions of PHP
for( $i=0; $i<$l; $i++) {
    $srcs[$i] = $imgs->item($i)->getAttribute("src");
}
Community
  • 1
  • 1
Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
  • I was going to answer with the pony but, you did it first! I do think too that you shouldn't be using regex to parse html – Hugo Dozois Oct 16 '12 at 02:02
0

You'll get a lot of grief for trying to pull this stuff out using RegEx instead of using a proper document/HTML parser, but I personally see no problem with using RegEx in this case because the HTML is so simple - and your goal is simple as well.

Try this:

preg_match_all('#src="(.*?)"#',$htmlstring,$matches);
print_r($matches[1]); //the array you want
Sean Johnson
  • 5,567
  • 2
  • 17
  • 22
  • You should also look at http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Baba Oct 16 '12 at 01:05
  • Only this guy isn't trying to parse all of the complexities of HTML - he's trying to pull out all src="" attributes from a very repetitive piece of HTML that I assume is always going to be the same. There is nothing wrong with using the above solution. – Sean Johnson Oct 17 '12 at 03:31
  • The second comment on that question reflects my thoughts exactly - http://stackoverflow.com/a/1733489/755900 – Sean Johnson Oct 17 '12 at 03:33