I'm trying to wrap all instances of certain phrases in a <span>
using PHP's DOMDocument and XPath. I've based my logic off of this answer from another post, but this is only allowing me to select the first match within a node, when I need to select all matches.
Once I modify the DOM for the first match, my subsequent loops cause an error, stating Fatal error: Uncaught Error: Call to a member function splitText() on bool
at the line that beings with $after
. I'm pretty sure this is being caused by modifying the markup, but I've been unable to figure out why.
What am I doing wrong here?
/**
* Automatically wrap various forms of CCJM in a class for branding purposes
*
* @link https://stackoverflow.com/a/6009594/654480
*
* @param string $content
* @return string
*/
function ccjm_branding_filter(string $content): string {
if (! (is_admin() && ! wp_doing_ajax()) && $content) {
$DOM = new DOMDocument();
/**
* Use internal errors to get around HTML5 warnings
*/
libxml_use_internal_errors(true);
/**
* Load in the content, with proper encoding and an `<html>` wrapper required for parsing
*/
$DOM->loadHTML("<?xml encoding='utf-8' ?><html>{$content}</html>", LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
/**
* Clear errors to get around HTML5 warnings
*/
libxml_clear_errors();
/**
* Initialize XPath
*/
$XPath = new DOMXPath($DOM);
/**
* Retrieve all text nodes, except those within scripts
*/
$text = $XPath->query("//text()[not(parent::script)]");
foreach ($text as $node) {
/**
* Find all matches, including offset
*/
preg_match_all("/(C\.? ?C\.?(?:JM| Johnson (?:&|&|&|and) Malhotra)(?: Engineers, LTD\.?|, P\.?C\.?)?)/i", $node->textContent, $matches, PREG_OFFSET_CAPTURE);
/**
* Wrap each match in appropriate span
*/
foreach ($matches as $group) {
foreach ($group as $key => $match) {
/**
* Determine the offset and the length of the match
*/
$offset = $match[1];
$length = strlen($match[0]);
/**
* Isolate the match and what comes after it
*/
$word = $node->splitText($offset);
$after = $word->splitText($length);
/**
* Create the wrapping span
*/
$span = $DOM->createElement("span");
$span->setAttribute("class", "__brand");
/**
* Replace the word with the span, and then re-insert the word within it
*/
$word->parentNode->replaceChild($span, $word);
$span->appendChild($word);
break; // it always errors after the first loop
}
}
}
/**
* Save changes, remove unneeded tags
*/
$content = implode(array_map([$DOM->documentElement->ownerDocument, "saveHTML"], iterator_to_array($DOM->documentElement->childNodes)));
}
return $content;
}
add_filter("ccjm_final_output", "ccjm_branding_filter");
Example content (all instances of "C.C. Johnson & Malhotra, P.C." and "CCJM" are matched for, but only the first can be successfully modified):
C.C. Johnson & Malhotra, P.C. (CCJM) was an integral member of a large Design Team for a 16.5-mile-long Public-Private Partnership (P3) Purple Line Project. The east-west light rail system extends from New Carrollton in PG County, MD to Bethesda in MO County, MD with 21 stations and one short tunnel. CCJM was Engineer of Record (EOR) for the design of eight (8) Bridges and design reviews for 35 transit/highway bridges and over 100 retaining walls of different lengths/types adjacent to bridges and in areas of cut/fill. CCJM designed utility structures for 42,000 LF of relocated water mains and 19,000 LF of relocated sewer mains meeting Washington Suburban Sanitary Commission (WSSC), Md Dept of Transportation (MDOT) MTA, and Local Standards.
EDIT 1: Doing some testing, when I output $node->textContent
, I see that it changes after the first loop... so I think what's happening is that after I do $node->splitText($offset)
, it's actually updating the entire node, so subsequent offsets don't work.