I am aiming to replace the anchor text in anchor tags of given text block with the title of the page the href
attribute points to.
That is my html contains -
"This is <a href="https://www.example.com">www.example.com</a>"
I'd like to replace it with -
"This is <a href="https://www.example.com">Example Domain</a>"
Here's my PHP Code -
$domDocument = new \DOMDocument();
$domDocument->loadHTML($text, LIBXML_HTML_NOIMPLIED | LIBXML_NOERROR);
$domDocument->formatOutput = true;
$links = $domDocument->getElementsByTagName('a');
// Step 3: Iterate on Each Link
foreach($links as $link)
{
// Step 4: Extract the href attribute from the link
$href = $link->getAttribute('href');
// Step 5: Using the extracted href, fetch the page title
$title = $this->fetchPageTitle($href);
// Step 8: Replace the existing anchor text with page title
$link->nodeValue = $title;
return $domDocument->saveHTML();
}
private function fetchPageTitle($url) :string
{
// Step 6: Fetch the contents of the page
$page_html = Http::get($url)->body();
// Step 7: Initiate a new DomDocument Object and exract page title
$pageDocument = new \DOMDocument();
$pageDocument->loadHTML($page_html, LIBXML_NOERROR);
$title = $pageDocument->getElementsByTagName('title')->item(0)->nodeValue;
return $title;
}
While this code works, it produces garbled text for some of the text.
That is the text:
Settings → Apple → Tag
Is reformatted to
Settings → Apple → Tag
.
and We'll
gets reformatted to We’ll
How do I make fix this issue?