0

I'm picking up information from a remote website and I'm stuck with this bit of html:

<div>
        <a onclick="javascripthere" href="#">
            <img width="110" height="160" alt="" src="imageurlhere">
            {variable sized string}
        </a>
        <br>2012/01/10 17:35:20<br>
        <img alt="{variable sized string}" src="imageurlhere">
</div>

From the above html i need to pick up the 2 "{variable sized string}". They can be any type of character (a-zA-Z0-9 and spaces / other characters), and i can't figure out what kind of regex or php to use to get those two strings.

Any suggestions?

Dave Siegel
  • 203
  • 1
  • 10

3 Answers3

0

You can use DOMDocument to do this instead of using regular expressions, which are not ideal for parsing HTML or XML. Your code will be much cleaner and easier to read for a start.

For example:

$doc = new DOMDocument();
$doc->loadHTML("<html><body><img alt="{variable sized string}" src="imageurlhere"></body></html>");
$images = $doc->getElementsByTagName('img');
foreach($images as $image) {
    echo $image->getAttribute('alt');
}

There are a number of projects that wrap DOMDocument with easier APIs such as phpquery (jQuery like selectors to navigate the DOM) and Simple HTML DOM Parser.

Treffynnon
  • 21,365
  • 6
  • 65
  • 98
0

Don't use regular expressions to parse HTML.

Use a DOM parser. It will make your development much simpler.

Community
  • 1
  • 1
David
  • 208,112
  • 36
  • 198
  • 279
0
$preg_match("/<img.*?>(.*?)<\/a>/", $string, $match);

//$match[1] is your first string

$preg_match("/<img alt=\"(.*?)\"/", $string, $match2);

//match2[1] is your second string.
blake305
  • 2,196
  • 3
  • 23
  • 52