find url parameter with preg_match

Question

I am parsing my website (html code) with curl:

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, "http://example.com/product.html");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);

$content = curl_exec($ch);

Now i want to find a specific <span> with an <a> the a tag contains an href with a parameter. Is it possible to find this parameter ([eventUid]=22) with preg match? I want to save the 22 (id) that comes from a database to a variable using PHP.

Example:

<span><a title="mytitle" href="http://example.com/products.html?tx_example_pi1[eventUid]=22">example</a></span>

if (preg_match('@((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)@', $content, $matches)) {
    echo $matches[2];
} else {
    echo 'Nothing found!';
}

At the moment I only found links with this preg search.

just a suggestion: why not use parse_str.. its much faster.. — Dinesh, Apr 09 '13 at 08:01
Doing that with regular expressions looks terribly complicated. I'd suggest to simplify and use [DOM functions](http://www.php.net/manual/en/book.dom.php) and [parse_url()](http://php.net/parse_url) instead. — Álvaro González, Apr 09 '13 at 08:03
if you found the link, why dont you simply split the string with '=' and get the id (22)? — Raheel Hasan, Apr 09 '13 at 08:03
i do not find the link what i am searching for...i will try parse url — Jim, Apr 09 '13 at 08:09
possible duplicate of [How to parse and process HTML/XML?](http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-xml) — hjpotter92, Apr 09 '13 at 08:13

Ja͢ck · Accepted Answer · 2013-04-09T09:30:45.350

1

Using regular expressions to search through HTML is error prone; it's better to use XPath for that:

$doc = new DOMDocument;
$doc->loadHTML($content);
$xp = new DOMXPath($doc);

foreach ($xp->query('//span/a[contains(@href, "[eventUid]=")]') as $anchor) {
    if (preg_match('/\[eventUid\]=(\d+)/', $anchor->getAttribute('href'), $matches)) {
        echo $matches[1];
    }
}

edited Apr 09 '13 at 09:30

answered Apr 09 '13 at 08:13

Ja͢ck

170,779
38
263
309

what did you mean with $content? the url of my website? And foreach ($xp->query('//span/a[contains(@href, "[eventUid]=22")]') is not possible because the number is dynamic this would be better foreach ($xp->query('//span/a[contains(@href, "[eventUid]=")]') ? – Jim Apr 09 '13 at 09:30
@Jim You already have `$content` coming from `curl_exec()`. I've updated the XPath and updated the code inside the loop. – Ja͢ck Apr 09 '13 at 09:31
thanks, but he doesnt go inside the foreach :( var_dump($xp) returns this object(DOMXPath)#179 (0) { } – Jim Apr 09 '13 at 09:44
@Jim Well, it works [here](http://codepad.viper-7.com/IABtBw), which is the HTML you gave earlier. – Ja͢ck Apr 09 '13 at 09:46
hmm yes your example works...it seems to be a problem to parse the website http://codepad.viper-7.com/E0rccz – Jim Apr 09 '13 at 11:13
$content = curl_exec($ch); // output string(14909) $doc = new DOMDocument; var_dump($doc); object(DOMDocument)#178 (0) { } $doc->loadHTML($content); var_dump($doc); object(DOMDocument)#178 (0) { } $xp = new DOMXPath($doc); var_dump($xp); object(DOMXPath)#179 (0){ } – Jim Apr 09 '13 at 12:15
@Jim $doc->saveHTML() should show you what it managed to parse. – Ja͢ck Apr 09 '13 at 12:31
$doc-saveHTML($content) gives me string(14909) via var_dump and with echo the whole website – Jim Apr 09 '13 at 12:41
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/27858/discussion-between-jim-and-jack) – Jim Apr 09 '13 at 12:45
@Jim Well, the website doesn't contain any links with `[eventUid]`; see [here](http://codepad.viper-7.com/NL3mfm). – Ja͢ck Apr 09 '13 at 12:46

find url parameter with preg_match

1 Answers1