2

I've been searching for a solution to this but haven't found quite the right thing yet.

The situation is this: I need to find all links on a page with a given class (say class="tracker") and then append query string values on the end, so when a user loads a page, those certain links are updated with some dynamic information.

I know how this can be done with Javascript, but I'd really like to adapt it to run server side instead. I'm quite new to PHP, but from the looks of it, XPath might be what I'm looking for but I haven't found a suitable example to get started with. Is there anything like GetElementByClass?

Any help would be greatly appreciated!

Shadowise

alex
  • 479,566
  • 201
  • 878
  • 984
Shadowise
  • 23
  • 5
  • possible duplicate of [Finding links matching given string in xpath/domdocument query](http://stackoverflow.com/questions/5251282/finding-links-matching-given-string-in-xpath-domdocument-query) – Gordon Apr 13 '11 at 11:23
  • possible duplicate of [Grabbing the href attribute for an a element](http://stackoverflow.com/questions/3820666/regular-expression-for-grabbing-the-href-attribute-of-an-a-element/3820783#3820783) – Gordon Apr 13 '11 at 11:24
  • possible duplicate of [Replace values in a URI query string](http://stackoverflow.com/questions/3777481/replace-values-in-a-uri-query-string) – Gordon Apr 13 '11 at 11:25
  • the three above should contain all the information you need to solve this – Gordon Apr 13 '11 at 11:26
  • @Gordon You are *still* the dupe finding king, and of course, after I have answered :P. Throwing my vote in too. – alex Apr 13 '11 at 11:31
  • @Gordan Thanks, I will check these out! – Shadowise Apr 13 '11 at 12:16
  • 2
    @Gordon: From the XPath point of view this is also a duplicate of http://stackoverflow.com/questions/5304791/php-simple-xpath-question –  Apr 13 '11 at 17:06

3 Answers3

3

Is there anything like GetElementByClass?

Here is an implementation I whipped up...

function getElementsByClassName(DOMDocument $domNode, $className) {
    $elements = $domNode->getElementsByTagName('*');
    $matches = array();
    foreach($elements as $element) {
        if ( ! $element->hasAttribute('class')) {
            continue;
        }
        $classes = preg_split('/\s+/', $element->getAttribute('class'));
        if ( ! in_array($className, $classes)) {
            continue;
        }
        $matches[] = $element;
    }
    return $matches;
}

This version doesn't rely on the helper function above.

$str = '<body>
    <a href="">a</a>
        <a href="http://example.com" class="tracker">a</a>
        <a href="http://example.com?hello" class="tracker">a</a>
    <a href="">a</a>
</body>
    ';

$dom = new DOMDocument;

$dom->loadHTML($str);

$anchors = $dom->getElementsByTagName('body')->item(0)->getElementsByTagName('a');

foreach($anchors as $anchor) {

    if ( ! $anchor->hasAttribute('class')) {
        continue;
    }

    $classes = preg_split('/\s+/', $anchor->getAttribute('class'));

    if ( ! in_array('tracker', $classes)) {
        continue;
    }

    $href = $anchor->getAttribute('href');

    $url = parse_url($href);

    $attach = 'stackoverflow=true';

    if (isset($url['query'])) {
        $href .= '&' . $attach;
    } else {
        $href .= '?' . $attach;
    }

    $anchor->setAttribute('href', $href);
}

echo $dom->saveHTML();

Output

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
    <a href="">a</a>
        <a href="http://example.com?stackoverflow=true" class="tracker">a</a>
        <a href="http://example.com?hello&amp;stackoverflow=true" class="tracker">a</a>
    <a href="">a</a>
</body></html>
alex
  • 479,566
  • 201
  • 878
  • 984
  • Thanks! This looks like a good place to get started. To start, can I change `$dom->loadHTML($str);` to `$dom->loadHTML($html);` to parse the whole page rather than loading a string? – Shadowise Apr 13 '11 at 12:11
  • I think I gather how your code works, but how would I modify the `$dom->loadHTML($str)` to parse the current page for links, then edit those links in place. I need to run the script on existing pages, so passing in strings isn't really viable. – Shadowise Apr 13 '11 at 16:00
  • @Shadowise You'd need to use this to *preprocess* your pages. It would be much easier if you were using views on your page; you could run them through this function before outputting them. – alex Apr 13 '11 at 23:05
2

I need to find all links on a page with a given class (say class="tracker") [...] I'm quite new to PHP, but from the looks of it, XPath might be what I'm looking for but I haven't found a suitable example to get started with. Is there anything like GetElementByClass?

This XPath 1.0 expression:

//a[contains(
       concat(' ',normalize-space(@class),' '),
       ' tracker '
    )
]
  • BTW, would that work if it were the first or last class (considering there would not be a space either side in that example)? – alex Apr 13 '11 at 23:12
  • @alex: Yes. That's why the concatenation. –  Apr 13 '11 at 23:13
0

A bit shorter, using xpath:

$dom = new DomDocument();
$dom->loadXml('<?xml version="1.0" encoding="UTF-8" ?>
<root>
    <a href="somlink" class="tracker foo">label</a>
    <a href="somlink" class="foo">label</a>
    <a href="somlink">label</a>
    <a href="somlink" class="atrackerb">label</a>
    <a href="somlink">label</a>
    <a href="somlink" class="tracker">label</a>
    <a href="somlink" class="tracker">label</a>
</root>');

$xpath = new DomXPath($dom);

foreach ($xpath->query('//a[contains(@class, "tracker")]') as $node) {
    if (preg_match('/\btracker\b/', $node->getAttribute('class'))) {
        $node->setAttribute(
            'href',
            $node->getAttribute('href') . '#some_extra'
        );
    }

}

header('Content-Type: text/xml; charset"UTF-8"');
echo $dom->saveXml();
Yoshi
  • 54,081
  • 14
  • 89
  • 103
  • @Yoshi That appears to match elements such as ``. That is why I went with splitting them :) – alex Apr 13 '11 at 11:32
  • @Yoshi Not really matching the class then, is it? :P – alex Apr 13 '11 at 11:34
  • @alex true, but this could also be done in the foreach loop ;) (changed the example) – Yoshi Apr 13 '11 at 11:42
  • @Yoshi That will do the trick, however, can't you [use regular expressions in XPath selectors](http://www.regular-expressions.info/xpath.html) as well? – alex Apr 13 '11 at 11:46
  • @alex Yes, but php doesn't know the 'matches' function. In 5.3 one could use: registerPhpFunctions and then use preg_match in the query. – Yoshi Apr 13 '11 at 11:51