1

If I have understood correctly, you should stay away from regular expressions when it comes to finding stuff in HTML. What is a good alternative to that, which is built in to standard PHP?

In my particular case right now I want to find all the image tags with their src, alt, height and width attributes. Later I'd also want to find certain meta tags. Either way, how would you do this with PHP?

The PHP version on my webhost is currently 5.2.x.

Svish
  • 152,914
  • 173
  • 462
  • 620
  • duplicate of [Best methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) and [most of your DOM usecases are covered in here](http://stackoverflow.com/search?q=user%3A208809+dom), for instance [Grabbing the href attribute of an a element](http://stackoverflow.com/questions/3820666/grabbing-the-href-attribute-of-an-a-element/3820783#3820783) – Gordon Apr 13 '11 at 17:46
  • @Gordon: You should rename your account to `GorDOM` :) – drudge Apr 13 '11 at 18:46

2 Answers2

4

You can always use some PHP DOM methods

Naftali
  • 144,921
  • 39
  • 244
  • 303
3

The DOMXPath object allows you to run XPath queries against XML in PHP. XPath allows you to extract specific tags from XML documents. It is language neutral (like regular expressions) and practically every programming language supports it.

$dom = new DOMDocument();
$dom->loadHTML('<html><body><img src="image.jpg" /></body></html>');
$xpath = new DOMXPath($dom);
$allImgNodes = $xpath->query("//img");
Michael
  • 34,873
  • 17
  • 75
  • 109