3

how can i replace all the anchors with each anchor text . my code is

$body='<p>The man was <a href="http://www.example.com/video/">dancing like a little boy</a> while all kids were watching ... </p>';

i want the result to be :

<p>The man was dancing like a little boy while all kids were watching ... </p>

i used :

$body= preg_replace('#<a href="https?://(?:.+\.)?ok.co.*?>.*?</a>#i', '$1', $body);

and result is :

<p>The man was while all kids were watching ... </p>
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
khalil
  • 31
  • 4
  • When should the ellipsis ... appear? After a certain no. of words or characters? – Indrasis Datta Oct 01 '16 at 07:37
  • body string contains many anchors and i want to loop them all checking exactly 'www.example.com' and not the sub-domains, replacing each anchor with its text . thanks – khalil Oct 01 '16 at 07:39
  • @khalil Try below Answer put by me. This will resolve your issue. – Manish Oct 01 '16 at 07:40
  • Are you sure you want to go the regex path instead of the [plethora of libraries available for you to actually reliably parse a DOM](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php)? – Wrikken Oct 01 '16 at 17:01

4 Answers4

5

Try this

$body='<p>The man was <a href="http://www.example.com/video/">dancing like a little boy</a> while all kids were watching ... </p>';

    echo preg_replace('#<a.*?>([^>]*)</a>#i', '$1', $body);
Manish
  • 3,443
  • 1
  • 21
  • 24
  • thanks this works to replace all anchors but sorry i should mentioned that i dont want to replace sub domains in href, it should match exactly if href contains 'www.example.com' and not to replace any other domain or sub-domain – khalil Oct 01 '16 at 07:46
  • Could you explain with example how you want to see? – Manish Oct 01 '16 at 07:53
2

Without regexes.....

<?php

$d = new DOMDocument();
$d->loadHTML('<p>The man was <a href="http://www.example.com/video/">dancing like a little boy</a> while all kids were watching ... </p>');
$x = new DOMXPath($d);
foreach($x->query('//a') as $anchor){
    $url = $anchor->getAttribute('href');
    $domain = parse_url($url,PHP_URL_HOST);
    if($domain == 'www.example.com'){
        $anchor->parentNode->replaceChild(new DOMText($anchor->textContent),$anchor);
    }
}

function get_inner_html( $node ) {
    $innerHTML= '';
    $children = $node->childNodes;
    foreach ($children as $child) {
        $innerHTML .= $child->ownerDocument->saveXML( $child );
    }
    return $innerHTML;
}
echo get_inner_html($x->query('//body')[0]);
Wrikken
  • 69,272
  • 8
  • 97
  • 136
  • Why using XPath when `$d->getElementsByTagName('a')` does the same. However XPath may be interesting if you register a function to check the domain name and if you select only links nodes that have this domain in your query: http://php.net/manual/en/domxpath.registerphpfunctions.php – Casimir et Hippolyte Oct 01 '16 at 21:47
  • You could use that as well for sure, I just needed a quick non-regex example where `XPath` is my default go-to. You don't strictly need a PHP function for the href though: `//a[starts-with(@href,'http://www.example.com')]` might work for the OP as well, depending on whether alternatives like `https://www.example.com` or `//www.example.com` are expected or not. – Wrikken Oct 02 '16 at 07:35
1

You could simply use strip_tags() and htmlspecialchars() here.

strip_tags - Strip HTML and PHP tags from a string

htmlspecialchars - Convert special characters to HTML entities

Step 1: Use strip_tags() to strip all tags except the <p> tag.

Step 2: Since we need to obtain the string along with the HTML tags, we need to use htmlspecialchars().

echo htmlspecialchars(strip_tags($body, '<p>'));

When there's already an in-built PHP function, I think it's better and more compact to use that instead of using preg_replace

Community
  • 1
  • 1
Indrasis Datta
  • 8,692
  • 2
  • 14
  • 32
  • sorry i should mentioned that i dont want to replace sub domains in href, it should match exactly if href contains 'www.example.com' and not to replace any other domain or sub-domain – khalil Oct 01 '16 at 07:44
1

can use this code:

regex : /< a.*?>|<a.*?>|<\/a>/g

$body='<p>The man was <a href="http://www.example.com/video/">dancing like a little boy</a> while all kids were watching ... </p>';

echo preg_replace('/< a.*?>|<a.*?>|<\/a>/', ' ', $body);

test and show example match word: https://regex101.com/r/mgYjoB/1

Farhang Negari
  • 229
  • 1
  • 12
  • There shouldn't be any space before a valid tag name, hence `< a` is not valid. Also you could shorten your regex to this **`<\/?a\b[^<>]*>`** – revo Oct 01 '16 at 18:19