1

I am struggling converting html links into text keeping same html structure.

I need to covert this html page part

<div>
    <p>text text bla blah</p>
    <p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
    <p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
</div>

into this

<div>
    <p>text text bla blah</p>
    <p>Cool website https://google.com</p>
    <p>Cool website https://google.com</p>
</div>

I found a nice script PHP regex: How to convert HTML string with links into plain text that shows URL after text in brackets which collects html links and converts them into plain text and that is part of job.

this is what i have so far:

$htmlString = '
<div>
    <p>text text bla blah</p>
    <p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
    <p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
</div>
';

libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($htmlString);
$xpath = new DOMXPath($dom);

$links = [];
$linksAsString = '';

foreach ($xpath->query('//a') as $linkElement)
{
    $link = [
        'href' => $linkElement->getAttribute('href'),
        'text' => $linkElement->textContent
    ];
    $links[] = $link;

    $linksAsString .= $link['text'] . " {$link['href']}<br/>";
}
libxml_clear_errors();

echo $linksAsString;

current code only outputs converted links:

Cool website https://google.com
Cool website https://google.com

I would appreciate some help.

tiktikis
  • 11
  • 2

2 Answers2

0

You could use str_replace with the full element.

<?php
$htmlString = '
<div>
    <p>text text bla blah</p>
    <p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
    <p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
</div>
';
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($htmlString);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//a') as $linkElement)
{
    $htmlString = str_replace($dom->saveHTML($linkElement), $linkElement->textContent . ' ' . $linkElement->getAttribute('href'), $htmlString);
}
libxml_clear_errors();

echo $htmlString;

Output:

<div>
    <p>text text bla blah</p>
    <p>Cool website https://google.com</p>
    <p>Cool website https://google.com</p>
</div>

Demo: https://eval.in/830127

chris85
  • 23,846
  • 7
  • 34
  • 51
0

It's a bit of a pain, but using DOM can achieve what your after, you just need to mess around a bit to get the right text in the right space...

<?php
$htmlString = '
<div>
    <p>text text bla blah</p>
    <p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
    <p><a href="https://google.com" rel="nofollow" target="_blank" title="google">Cool website</a></p>
</div>
';

libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($htmlString);
$xpath = new DOMXPath($dom);

$links = [];
$linksAsString = '';

foreach ($xpath->query('//a') as $linkElement)
{
    $linksAsString = $linkElement->textContent . " ".$linkElement->getAttribute('href');
    $parentNode = $linkElement->parentNode;
    $parentNode->removeChild($linkElement);
    $newText = $dom->createTextNode($linksAsString);
    $parentNode->appendChild($newText);
}
libxml_clear_errors();

echo $dom->saveXML();

Gives...

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div>
    <p>text text bla blah</p>
    <p>Cool website https://google.com</p>
    <p>Cool website https://google.com</p>
</div></body></html>
Nigel Ren
  • 56,122
  • 11
  • 43
  • 55