preg_replace href anchor with anchor text

Question

how can i replace all the anchors with each anchor text . my code is

$body='<p>The man was <a href="http://www.example.com/video/">dancing like a little boy</a> while all kids were watching ... </p>';

i want the result to be :

<p>The man was dancing like a little boy while all kids were watching ... </p>

i used :

$body= preg_replace('#<a href="https?://(?:.+\.)?ok.co.*?>.*?</a>#i', '$1', $body);

and result is :

<p>The man was while all kids were watching ... </p>

When should the ellipsis ... appear? After a certain no. of words or characters? — Indrasis Datta, Oct 01 '16 at 07:37
body string contains many anchors and i want to loop them all checking exactly 'www.example.com' and not the sub-domains, replacing each anchor with its text . thanks — khalil, Oct 01 '16 at 07:39
@khalil Try below Answer put by me. This will resolve your issue. — Manish, Oct 01 '16 at 07:40
Are you sure you want to go the regex path instead of the [plethora of libraries available for you to actually reliably parse a DOM](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php)? — Wrikken, Oct 01 '16 at 17:01

score 5 · Answer 1 · answered Oct 01 '16 at 07:38

5

Try this

$body='<p>The man was <a href="http://www.example.com/video/">dancing like a little boy</a> while all kids were watching ... </p>';

    echo preg_replace('#<a.*?>([^>]*)</a>#i', '$1', $body);

answered Oct 01 '16 at 07:38

Manish

3,443
1
21
24

thanks this works to replace all anchors but sorry i should mentioned that i dont want to replace sub domains in href, it should match exactly if href contains 'www.example.com' and not to replace any other domain or sub-domain – khalil Oct 01 '16 at 07:46
Could you explain with example how you want to see? – Manish Oct 01 '16 at 07:53

score 2 · Answer 2 · answered Oct 01 '16 at 17:12

2

Without regexes.....

<?php

$d = new DOMDocument();
$d->loadHTML('<p>The man was <a href="http://www.example.com/video/">dancing like a little boy</a> while all kids were watching ... </p>');
$x = new DOMXPath($d);
foreach($x->query('//a') as $anchor){
    $url = $anchor->getAttribute('href');
    $domain = parse_url($url,PHP_URL_HOST);
    if($domain == 'www.example.com'){
        $anchor->parentNode->replaceChild(new DOMText($anchor->textContent),$anchor);
    }
}

function get_inner_html( $node ) {
    $innerHTML= '';
    $children = $node->childNodes;
    foreach ($children as $child) {
        $innerHTML .= $child->ownerDocument->saveXML( $child );
    }
    return $innerHTML;
}
echo get_inner_html($x->query('//body')[0]);

answered Oct 01 '16 at 17:12

Wrikken

69,272
8
97
136

Why using XPath when `$d->getElementsByTagName('a')` does the same. However XPath may be interesting if you register a function to check the domain name and if you select only links nodes that have this domain in your query: http://php.net/manual/en/domxpath.registerphpfunctions.php – Casimir et Hippolyte Oct 01 '16 at 21:47
You could use that as well for sure, I just needed a quick non-regex example where `XPath` is my default go-to. You don't strictly need a PHP function for the href though: `//a[starts-with(@href,'http://www.example.com')]` might work for the OP as well, depending on whether alternatives like `https://www.example.com` or `//www.example.com` are expected or not. – Wrikken Oct 02 '16 at 07:35

score 1 · Answer 3 · edited Jun 20 '20 at 09:12

1

You could simply use strip_tags() and htmlspecialchars() here.

strip_tags - Strip HTML and PHP tags from a string

htmlspecialchars - Convert special characters to HTML entities

Step 1: Use strip_tags() to strip all tags except the <p> tag.

Step 2: Since we need to obtain the string along with the HTML tags, we need to use htmlspecialchars().

echo htmlspecialchars(strip_tags($body, '<p>'));

When there's already an in-built PHP function, I think it's better and more compact to use that instead of using preg_replace

edited Jun 20 '20 at 09:12

Community

1
1

answered Oct 01 '16 at 07:38

Indrasis Datta

8,692
2
14
32

sorry i should mentioned that i dont want to replace sub domains in href, it should match exactly if href contains 'www.example.com' and not to replace any other domain or sub-domain – khalil Oct 01 '16 at 07:44

Farhang Negari · Answer 4 · 2016-10-01T08:11:01.830

1

can use this code:

regex : /< a.*?>|<a.*?>|<\/a>/g

$body='<p>The man was <a href="http://www.example.com/video/">dancing like a little boy</a> while all kids were watching ... </p>';

echo preg_replace('/< a.*?>|<a.*?>|<\/a>/', ' ', $body);

test and show example match word: https://regex101.com/r/mgYjoB/1

edited Oct 01 '16 at 08:11

answered Oct 01 '16 at 08:00

Farhang Negari

229
1
12

There shouldn't be any space before a valid tag name, hence `< a` is not valid. Also you could shorten your regex to this **`<\/?a\b[^<>]*>`** – revo Oct 01 '16 at 18:19

preg_replace href anchor with anchor text

4 Answers4