1

We are looking to add some hairline spacing to punctuation to improve the appearance of a webpage's typography. Adding the hairline spacing to change (what) to ( what ) seems pretty straight forward using str_replace, several times to cover the four main punctuation marks.

str_replace("(", "( ", $content);
str_replace(")", " )", $content);
str_replace("?", " ?", $content);
str_replace("!", " !", $content);

BUT we need to limit the replacement process to only the content within the main div <div id="main">bla (bla) bla</div> as the targeted punctuation marks ( ? ! ) are also used by the CSS, JS, etc on that page.

The pages will have been minified before the space insertion is applied, so comments, line breaks and such will have been stripped out and not a concern.

Is there a way to target just a section of the content string?

And a second concern would be how to avoid targeting ? within a link url? Basically trying to ignore items within an <a href=url'> that is within the main div.

THIS QUESTION WAS NOT A DUPLICATE OF THE OTHER ONE WHICH ASKED ABOUT EXTRACTING INFO. THIS ONE IS ABOUT MODIFYING INDIVIDUAL ALPHANUMERIC CHARACTERS IN A WEBPAGE.

Tom
  • 2,928
  • 4
  • 28
  • 36
  • can you please show us the code ? –  May 13 '20 at 02:08
  • 1
    Use a parser and then you can enter into the exact element you want. – user3783243 May 13 '20 at 02:28
  • 1
    So you have the entire HTML document in a PHP string? Can you not achieve what you want with CSS? That's what it's for after all – Phil May 13 '20 at 02:28
  • @Phil - yes at one point the entire html document is a PHP string. We use a CMS to generate our webpages and use a PHP script to fetch each page, minify it and store a static copy, which is served to the visitor. – Tom May 13 '20 at 02:44
  • @Phil - I know you can add space using css such as `::after` to an item like h1, but how do you add it to an individual character? – Tom May 13 '20 at 02:46
  • @Tom you're right, there's currently not any CSS that would achieve what you want. – Phil May 13 '20 at 04:20

1 Answers1

0

What you'll need to do is load your document into DOMDocument, then select all the relevant elements within your <div id="main"> element and replace the text within.

Something like this

$find = ['(', ')', '?', '!']; // characters to find
$replace = ['(&#8202;', '&#8202;)', '&#8202;?', '&#8202;!']; // replacements

// create a "text-contains" selector for all the characters
$selector = implode(' or ', array_map(function($char) {
    return sprintf('contains(text(), "%s")', $char);
}, $find));

// create an XPath query to get the text nodes
$query = sprintf('//div[@id="main"]//*[%s]/text()', $selector);

$doc = new DOMDocument();
$doc->loadHTML($content);

$xpath = new DOMXPath($doc);
$elements = $xpath->query($query);

foreach ($elements as $element) {
    // You need to decode the entities when working directly with text nodes
    $element->nodeValue = html_entity_decode(str_replace($find, $replace, $element->nodeValue));
}

$newContent = $doc->saveHTML();

Demo ~ https://3v4l.org/Q0fsn

See this post regarding that html_entity_decode() caveat ~ DOM in PHP: Decoded entities and setting nodeValue

Phil
  • 157,677
  • 23
  • 242
  • 245
  • Thanks, that works like a charm. If I may ask, what aspect of your code stops it from changing `?` in a url string? (always wanting to learn more) – Tom May 13 '20 at 08:26
  • 1
    This only selects text nodes whereas `href` is an attribute. If there are URL strings in text content, they will have the space added though – Phil May 13 '20 at 08:33
  • Thanks. You learn something new everyday or sometimes many somethings :) – Tom May 13 '20 at 08:38