Replace all title attributes in an html document

Question

I have html code in a variable. For example $html equals:

<div title="Cool stuff" alt="Cool stuff"><a title="another stuff">......</a></div>

I need to replace content of all title attributes title="Cool stuff" and title="anot stuff" and so on with title="$newTitle".

Is there any non-regex way to do this?

And if I have to use regex is there a better(performance-wise) and/or more elegant solution than what I came up with?

$html = '...'
$newTitle = 'My new title';

$matches = [];
preg_match_all(
    '/title=(\"|\')([^\"\']{1,})(\"|\')/',
    $html,
    $matches
);
$attributeTitleValues = $matches[2];

foreach ($attributeTitleValues as $title)
{
    $html = str_replace("title='{$title}'", "title='{$newTitle}'", $html);
    $html = str_replace("title=\"{$title}\"", "title=\"{$newTitle}\"", $html);
}

You should make use of `SimpleXMLElement()` to convert the html into an object so that you can XPath over the nodes which have `title="whatever"`. See https://stackoverflow.com/a/65206705/2191572 to get started. — MonkeyZeus, Dec 15 '20 at 19:44
@MonkeyZeus Ah, I didn't see the banner "There are disputes about this answer’s content ...". I've just seen that answer so many times when questions about Regex and HTML are asked — Tim Lewis, Dec 15 '20 at 19:44
To get all nodes with a title then this could be used `//*[@title]` — MonkeyZeus, Dec 15 '20 at 19:53
@MonkeyZeus thank you. That seems like the kind of solution I was looking for. — Marek Barta, Dec 16 '20 at 17:13
I can not accept this as correct answer so I will accept the other one(which was later) to close this up. But thank you again, — Marek Barta, Jan 13 '21 at 16:08

mickmackusa · Accepted Answer · 2020-12-18T10:23:43.140

Definitely don't use regex -- it is a dirty rabbit hole.
_{...the hole is dirty, not the rabbit :)}

I prefer to use DomDocument and Xpath to directly target all title attributes of all element in your html document.

LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD flags are in place to prevent your output being garnished with <doctype> and <html> tags.
// in the XPath expression says: go to any depth in search of matches

Code: (Demo)

$html = <<<HTML
<div title="Cool stuff" alt="Cool stuff"><a title="another stuff">......</a></div>
HTML;
$newTitle = 'My new title';

$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//@title') as $attr) {
    $attr->value = $newTitle;
}
echo $dom->saveHTML();

Output:

<div title="My new title" alt="Cool stuff"><a title="My new title">......</a></div>

Replace all title attributes in an html document

1 Answers1