Add comments and attributes including an incremented number to elements in an HTML string

Question

I have been trying to understand how preg_replace_callback() works, but I just don't get it.

Say for example, I get_contents from navigation.php.

In that text are a bunch of a href and divs and I want to give incremental ids to and add in some code commenting before each a href.

How would I loop over all those so they would all increment and add the ids and commenting?

<?php
$string = file_get_contents("navigation.php");
$i = 1;
$replace = "<a ";
$with = '<!-- UNIT'.$i.' --><a id=a_'.$i;
$replace2 = "<div ";
$with2 = '<div id=b_'.$i;
preg_replace_callback()
$i++

?>

I figured maybe if I could get an example with my code, maybe I would be able to understand it better.

Do $replace and $replace2 are my strings I am searching for and $with and $with2 are the replacements respectively, and $i being the increment.

An example of data coming in:

<a href="page4.php">Page 4</a>
<a href="page3.php">Page 3</a>
<div class="red">stuff</div>
<div class="blue">stuff</div>

I would want an output like..

<!-- UNIT 1 --><a id="a_1" href="page4.php">Page 4</a>
<!-- UNIT 2 --><a id="a_2" href="page3.php">Page 3</a>
<div id="b_1" class="red">stuff</div>
<div id="b_2" class="blue">stuff</div>

If you hope to have answers, please add example strings and what you want to obtain. — Casimir et Hippolyte, Feb 07 '14 at 14:45

score 0 · Accepted Answer · answered Feb 07 '14 at 15:03

You have multiple goals, the simplest way to accomplish them imo is doing it step-by-step.

1. The RegEx

You want two HTML tags, these can be caught easily via /(<a|<div)/i (explanation, g modifier is only used to demonstrate that it correctly matches).

With this you could write the following code:

$parsed = preg_replace_callback('/(<a|<div)/i', ???, $string);

2. The callback

The logic behind this can be simplified to the following switch

switch ($found) {
    case '<div':
        $result = '<div id="b_'.$id.'"';
        break;
    case '<a':
        $result = '<!-- UNIT'.$id.' --><a id="a_'.$id.'"';
        break;
    default:
        $result = "";
        break;
}

To implement this you can either write a new function or use an anonymous one. To make $id accessible, you need to learn about variable scope in PHP. An easy way out of using anything like global $id; or define() is using Closures with the use() syntax. To be able to manipulate $id (increment it), you'll need to pass it by reference (when using Closures). This brings you to the following code:

$parsed = preg_replace_callback("/(<a|<div)/", function($match) use (&$id) {
    switch ($match[1]) {
        case '<div':
            $result = '<div id="b_'.$id.'"';
            break;
        case '<a':
            $result = '<!-- UNIT'.$id.' --><a id="a_'.$id.'"';
            break;
        default:
            $result = $match[1];//do nothing
            break;
    }
    $id++;

    return $result;
}, $string);

Watch it work here.

wow, thanks for the great example! that will help a lot. One more question on this, how would I include something with a wildcard aspect? take p tags for example...if I had — Fred Turner, Feb 07 '14 at 18:32

score 0 · Answer 2 · answered Feb 06 '23 at 09:38

I recommend not using a preg_ function at all. PHP has a robust set of tools for parsing valid HTML -- use a DOM parser.

Code: (Demo)

$html = <<<HTML
<body>
<a href="page4.php">Page 4</a>
<a href="page3.php">Page 3</a>
<div class="red">stuff</div>
<div class="blue">stuff</div>
</body>
HTML;

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$counter = 0;
foreach ($dom->getElementsByTagName("a") as $a) {
    ++$counter;
    $comment = new DOMComment(" UNIT $counter ");
    $a->parentNode->insertBefore($comment, $a);
    $a->setAttribute('id', "a_$counter");
}
$counter = 0;
foreach ($dom->getElementsByTagName("div") as $b) {
    ++$counter;
    $b->setAttribute('id', "b_$counter");
}
echo substr($dom->saveHTML(), 7, -9);

I have wrapped your HTML in a parent body tag and removed it at the end of the script to aid in preserving the newlines of your input (otherwise some newlines will be lost while processing).

The remainder of the the syntax is rather self-documenting because the class methods are very descriptive of their functionality.

Add comments and attributes including an incremented number to elements in an HTML string

2 Answers2