Your question actually breaks down into two smaller problems. You've already solved one of them, which is parsing the URL with a regular expression. The second part is extracting text from HTML, which isn't easily solved by a regular expression at all. The confusion you have is in trying to do both at the same with a regular expression (parsing HTML and parsing the URL). See the parsing HTML with regex SO Answer for more details on why this is a bad idea.
So instead, let's just use an HTML parser (like DOMDocument
) to extract text nodes from the HTML and parse URLs inside those text nodes.
Here's an example
<?php
$html = <<<'HTML'
<p>This is a URL http://abcd/ims in text</p>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
// Let's walk the entire DOM tree looking for text nodes
function walk(DOMNode $node, $skipParent = false) {
if (!$skipParent) {
yield $node;
}
if ($node->hasChildNodes()) {
foreach ($node->childNodes as $n) {
yield from walk($n);
}
}
}
foreach (walk($dom->firstChild) as $node) {
if ($node instanceof DOMText) {
// lets find any links and change them to HTML
if (preg_match('/(http[s]{0,1}\:\/\/\S{4,})\s{0,}/ims', $node->nodeValue, $match)) {
$node->nodeValue = preg_replace('/(http[s]{0,1}\:\/\/\S{4,})\s{0,}/ims', "\xff ",
$node->nodeValue);
$nodeSplit = explode("\xff", $node->nodeValue, 2);
$node->nodeValue = $nodeSplit[1];
$newNode = $dom->createTextNode($nodeSplit[0]);
$href = $dom->createElement('a', $match[1]);
$href->setAttribute('href', $match[1]);
$node->parentNode->insertBefore($newNode, $node);
$node->parentNode->insertBefore($href, $node);
}
}
}
echo $dom->saveHTML();
Which gives you the desired HTML as output:
<p>This is a URL <a href="http://abcd/ims">http://abcd/ims</a> in text</p>