Using wildcard in Preg Match

Question

I am making a PHP scraper and have the following piece of code that grabs the title from the page by looking inside the span uiButtonText. However I want to now scan for a hyperlink and have it pregmatch <a href="*" class="thelink" onclick="*">(.*)</a>.

The stars I want to be wild cards so that I can get the hyperlink from the page even if the href and onclick changes for each one.

if (preg_match("/<span class=\"uiButtonText\">(.*)<\/span>/i", $cache, $matches)){print($matches[1] . "\n");}else {}

My Full Code:

<?php
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
$url = "http://www.facebook.com/MauiNuiBotanicalGardens/info";
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
$cache = $html;

if (preg_match("/<span class=\"uiButtonText\">(.*)<\/span>/i", $cache, $matches))    {print($matches[1] . "\n");}else {}
?>`

Type google.com in address bar, search for `DOM Document in PHP`. — hjpotter92, Mar 23 '13 at 02:28
Thanks for the reply guys, im still learning PHP would it be possible to expand a bit? — Simon Staton, Mar 23 '13 at 02:30
possible duplicate of [How to parse and process HTML/XML with PHP?](http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-xml-with-php) — mario, Mar 23 '13 at 02:33

score 0 · Accepted Answer · answered Mar 23 '13 at 02:34

if you want to stick with your regex, try this:

$html = '<span class="uiButtonText"><a href="http://google.com" class="thelink" onclick="#">Google!</a></span>';

preg_match("/<span class=\"uiButtonText\"><a href=\".*\" class=\"thelink\" onclick=\".*\">(.*)<\/a><\/span>/i", $html, $matches);

print_r($matches[1]);

Output:
Google!

A better way would be to use PHP Simple HTML DOM Parser and doing something like this:

$html = file_get_html("http://www.facebook.com/MauiNuiBotanicalGardens/info");
foreach($html->find("a.thelink") as $link){
    echo $link->innertext . "<BR>";
}

Above is not tested, but should work

Using wildcard in Preg Match

1 Answers1