3

I would like to remove anchor tag using php regex from the given string if it's not inside of another tag.

Input:

Hi Hello <a href="#">World</a>. This is <div class="some">testing <a href="#">content</a>. some more content</div>

Output:

Hi Hello. This is <div class="some">testing <a href="#">content</a>. some more content</div>

Thanks in advance.

YOU
  • 120,166
  • 34
  • 186
  • 219
Ramesh
  • 107
  • 4

2 Answers2

4

Something like this :

$string = 'replace <a href="x">A</a> but not <div> <a>B</a> in tag </div> but also <a>C</a><div></div>';

echo preg_replace('/<a[^>]*?>([^<]*)<\/a>(?![^<]*<\/)/gi', '', $string);

A negative lookahead ensures that the anchor tag isn't followed by </.
Thus it's not enclosed by another tag.

The content of the tag is in capture group 1, incase you want to replace by '\1' instead of by ''.

If it's about the div tags specifically then this one will ignore the div's :

echo preg_replace('/<div.*?>.*?<\/div>\K|<a[^>]*?>([^<]*)<\/a>/gi', '\1', $string);
LukStorms
  • 28,916
  • 5
  • 31
  • 45
1

I think this is not a job for regex but also tried by use of a common trick and (*SKIP)(*FAIL)

'~(<(?!a\b)(\w+)\b(?>(?:(?!</?\2\b).)+(?1)?)*</\2>)(*SKIP)(*F)|<a\b.*?</a>\s*~si'
  • The first part before (*SKIP)(*F) matches and skips any tags that are not <a recursively.
  • The second part after pipe | is the part to be matched with optional whitespace at the end.
  • Flags used: s (PCRE_DOTALL), i (PCRE_CASELESS)

Try pattern at regex101 or see eval.in for PHP Demo

There are probably better solutions by use of DOMDocument or other parser.

Community
  • 1
  • 1
bobble bubble
  • 16,888
  • 3
  • 27
  • 46