Remove anchors from text

Question

I need to remove anchor tags from some text, and can't seem to be able to do it using regex.
Just the anchor tags, not their content.
For instance, <a href="http://www.google.com/" target="_blank">google</a> would become google.

Yann Milin · Accepted Answer · 2015-07-09T21:43:10.807

14

Exactly, it cannot be done properly using a regular expression.

Here is an example using DOM :

$xml = new DOMDocument(); 
$xml->loadHTML($html); 

$links = $xml->getElementsByTagName('a');

//Loop through each <a> tags and replace them by their text content    
for ($i = $links->length - 1; $i >= 0; $i--) {
    $linkNode = $links->item($i);
    $lnkText = $linkNode->textContent;
    $newTxtNode = $xml->createTextNode($lnkText);
    $linkNode->parentNode->replaceChild($newTxtNode, $linkNode);
}

It's important to loop backward whenever changes will be made to the DOM.

edited Jul 09 '15 at 21:43

answered May 04 '11 at 09:26

Yann Milin

1,335
1
11
22

nice answer but how do i use it?..not really clear on usage. do i just echo out $newTxtNode? or lnkText??? – jcobhams Sep 25 '13 at 02:20
@VyrenMedia Op asked how to replace links by their text content, so at the end of this loop, you have a `DOMDocument` object with no links. You can use `$xml->saveHTML();` to get the whole html result. $lnkText contains the current link text as string, and you might want to [trim](http://php.net/trim) it. – Yann Milin Sep 25 '13 at 14:09
thanks a lot for your reply @Yann-Milin I however found a regex solution for this problem. – jcobhams Sep 25 '13 at 16:18
See below for the regular expression, the statement "_it cannot be done properly using a regular expression_." seems not to be true. – LarS Jul 08 '15 at 14:17
What I was trying to say is that any regular expression solution for this is not a good solution. You obviously _can_ run a regular expression query against html text, but it doesn't mean that you _should_ :) interesting read on the subject : [here](http://blog.codinghorror.com/parsing-html-the-cthulhu-way/) and [here](http://stackoverflow.com/a/1732454/165969) – Yann Milin Jul 09 '15 at 18:06

score 11 · Answer 2 · answered May 03 '11 at 13:48

11

Then you can try

preg_replace('/<\/?a[^>]*>/','',$Source);

I tried it online here on rubular

answered May 03 '11 at 13:48

stema

90,351
20
107
135

1

This is not correct, as it would also strip other tags starting with a like article or address. – LarS Jul 08 '15 at 13:49
maybe a better regex: preg_replace('/<\s*\/?\s*a(?:\s*|\s+[^>]*)>/', '', $vars['panes']); – LarS Jul 08 '15 at 15:00
@CSᵠ answer is better for remove even middle text of 'a' tags – Sadee Sep 29 '15 at 14:22

score 6 · Answer 3 · edited Nov 11 '12 at 05:58

6

This question has been answered already but I thought I would add my solution to the mix. I like this better than the accepted solution because its a bit more to the point.

$content = 
    preg_replace(array('"<a href(.*?)>"', '"</a>"'), array('',''), $content);

edited Nov 11 '12 at 05:58

Vikdor

23,934
10
61
84

answered Nov 11 '12 at 05:37

user1491929

654
8
16

1

This is nice and simple, can also use `$content = preg_replace(array('""', '""'), array('',''), $content);` in case "href" isn't the first attribute in the anchor tag. – David Thomas Oct 11 '16 at 03:49
@DavidThomas great addition! – user1491929 Oct 12 '16 at 13:49

score 6 · Answer 4 · edited May 03 '11 at 13:32

6

You are looking for strip_tags().

<?php

// outputs 'google'
echo strip_tags('<a href="http://www.google.com/" target="_blank">google</a>');

edited May 03 '11 at 13:32

user229044

232,980
40
330
338

answered May 03 '11 at 13:31

Pekka

442,112
142
972
1,088

2

I need to maintain other tags, I only need to remove anchors. – Lior May 03 '11 at 13:35
@Lior ah, I see. `strip_tags` does indeed not do that. There is an implementation in the user contributed notes that may help you: http://php.net/manual/en/function.strip-tags.php#100054 – Pekka May 03 '11 at 13:36
@Pekka You can pass a second argument to `strip_tags()` that is a string of "allowable_tags": http://php.net/manual/en/function.strip-tags.php. – Jasper Sep 17 '12 at 16:40
@Jasper but that won't help here, will it? He would have to specify all tags that exist in `$allowable_tags` – Pekka Sep 17 '12 at 16:41
@Pekka It is unfortunate that you have to blacklist rather than being able to whitelist what tags you want to remove but using some knowledge of that type of content is being parsed you can probably get that blacklist down to a small list. – Jasper Sep 17 '12 at 16:46

score 5 · Answer 5 · answered May 03 '11 at 13:36

5

using regex:

preg_replace('/<a[^>]+>([^<]+)<\/a>/i','\1',$html);

answered May 03 '11 at 13:36

CSᵠ

10,049
9
41
64

1

What if there is an `` element inside the anchor elements? – ridgerunner May 03 '11 at 14:18

Chrysus · Answer 6 · 2017-02-14T21:13:03.440

Much of the regex here did not help me. Some of it removes the content inside the anchor (which is not at all what OP asked for) and not all of the content at that, some of it will match any tag beginning with a, etc.

This is what I created for my needs at work. We had an issue where passing HTML to wkhtmltopdf that had anchor tags (with many data attributes and other attributes) would sometimes prevent the PDF from producing, so I wanted to remove those while keeping the text.

Regex:

/</?a( [^>]*)?>/ig

In PHP you can do:

$text = "<a href='http://www.google.com/'>Google1</a><br>" .
        "<a>Google2</a><br>" .
        "<afaketag href='http://www.google.com'>Google2</afaketag><br>" .
        "<afaketag>Google4</afaketag><br>" . 
        "<a href='http://www.google.com'><img src='someimage.jpg'></a>";
echo preg_replace("/<\/?a( [^>]*)?>/i", "", $text);

Outputs:

Google1<br>Google2<br><afaketag href='http://www.google.com'>Google2</afaketag><br><afaketag>Google4</afaketag><br><img src='someimage.jpg'>

score 0 · Answer 7 · answered May 03 '11 at 15:01

Have a try with:

$str = '<p>paragraph</p><a href="http://www.google.com/" target="_blank" title="<>">google -> foo</a><div>In the div</div>';
// first, extract anchor tag
preg_match("~<a .*?</a>~", $str, $match);
// then strip the HTML tags
echo strip_tags($match[0]),"\n";

output:

google -> foo

Remove anchors from text

7 Answers7

Linked