PHP Strip all content around text

Question

I have text that looks like this or a billion variant of this, for example:

 <div>content goes here... </div><div style="some style..."><span style="some styles..."><strong>[END_CONTACT]</strong></span></div><div>content goes here... </div>
 <div>content goes here... </div><div style="other style..."><span style="other styles..."><strong>[END_CONTACT]</strong></span></div><div>content goes here... </div>
 <div>content goes here... </div><div style="random stuff..."><span style="random stuff..."><strong>[END_CONTACT]</strong></span></div><div>content goes here... </div>
 and a billion variations of this...

I want to be able to remove any variation of the text surrounding [END_CONTACT] so that all I am left with this is this:

 <div>content goes here... </div><div>[END_CONTACT]</div><div>content goes here... </div>

How do I strip the content between the opening div tag and [END_CONTACT] and the content between [END_CONTACT] and the ending div tag?

Thanks

What sort of data is it, for example are you specifically ONLY trying to strip HTML / inline CSS data? — Martin, Mar 16 '21 at 16:23
Everything that is inside of the div tags with the exception of the [END_CONTACT] text — LargeTuna, Mar 16 '21 at 16:23
Wouldn't it be easier to just strip everything except the `[xxx]` and then add `div` back? — AbraCadaver, Mar 16 '21 at 16:24
The problem though is that this line of text is embedded deep in emails so I would in essence be stripping away all the email content. — LargeTuna, Mar 16 '21 at 16:25
The full environment of your problem is not clear, can you elaborate your problem? are you able to use jquery here? that would make it easier and using parent() you can easily solve your problem — Farhan Ibn Wahid, Mar 16 '21 at 16:31
@Psycho if it's PHP it's not usually an environment where JQuery will be present. — Martin, Mar 16 '21 at 16:36

Martin · Answer 1 · 2021-03-16T22:22:32.560

How do I strip the content between the opening div tag and [END_CONTACT] and the content between [END_CONTACT] and ending div tag?

If the terms [END_CONTACT] and the <div> tag are always present, you can use PCRE REGEX in preg_replace():

$string = preg_replace('/<div[^>]*>.*\[END_CONTACT\].*<\/div>/i','<div>[END_CONTACT]</div>',$string);

Example:

$data = [];
$data[] =  'some text <div style="some style..."><span style="some styles..."><strong>[END_CONTACT]</strong></span></div>';
$data[] = 'somrthing else etc.<div style="other style..."><span style="other styles..."><strong>[END_CONTACT]</strong></span></div>';
$data[] = '<div style="random stuff..."><span style="random stuff..."><strong>[END_CONTACT]</strong></span></div>';
$data[] = 'and a billion variations of this...';

foreach ($data as $row){

     $string = preg_replace('/<div[^>]*>.*\[END_CONTACT\].*<\/div>/i','<div>[END_CONTACT]</div>',$row);
     print $string."<BR>";

}

Output:

 <div>[END_CONTACT]</div>
 <div>[END_CONTACT]</div>
 <div>[END_CONTACT]</div>
 and a billion variations of this...

UPDATE:

Sorry, wasn't clear about that in my original post. Is there any way to keep text or code outside of the string in question but still do the operation as you've suggested?

Try this Regex in the above PHP code:

 (?!<div).(<div[^>]*>.*\[END_CONTACT\][^\div]*<\/div>)

Example:

 content content content... <div style="random stuff..."><span style="random stuff..."><strong>[END_CONTACT]</strong></span></div> content content content

Output:

  content content content... <div>[END_CONTACT]</div> content content content

NOTE:

It must be stated that you should use a DOM parser to work with HTML elements in complex compositions rather than Regex.

I have tested my answer and it does what is desired. And as stated above, what you should be using to deal with multilayered complex HTML is a proper PHP DOM Parser.

This works great except if there is text outside of the
tags - it will stip all of that away. Sorry, wasn't clear about that in my original post. Is there any way to keep text or code outside of the string in question but still do the operation as you've suggested? E.g. content content content...
[END_CONTACT]
content content content... — LargeTuna, Mar 16 '21 at 18:02
Here's the problem im running into, if the
[END_CONTACT]
has any div's after it, it will not find and replace the correct content, it just skips over it. E.g. this bombs out: content content content...
[END_CONTACT]
content content content
It just returns back: content content content...
[END_CONTACT]
content content content — LargeTuna, Mar 16 '21 at 21:03
@LargeTuna you have over 2000 reputation on Stack Overflow it would be assumed that you would be aware that you need to set out *all* the criteria of the question in the question, which saves everyones time. Please **clearly** update your question with you **exact** criteria and what you've tried to resolve this. [**I have tested my answer**](http://sandbox.onlinephpfunctions.com/code/bbc8b854cbb77c9cfe6f86637b14da4e57ed024f) and it does what is desired. And as stated, what you *should* be using to deal with multilayered complex HTML is a proper PHP DOM Parser — Martin, Mar 16 '21 at 22:21

PHP Guru · Answer 2 · 2021-03-16T16:37:47.363

0

Use regular expressions! The following example using preg_replace will work as long as your content doesn't contain angle brackets, which you should not put in HTML.

$result = preg_replace('#<div\b[^>]*><span\b[^>]*><strong\b[^>]*>([^<]*)</strong></span></div>#i', '<div>$1</div>', $html);

edited Mar 16 '21 at 16:37

answered Mar 16 '21 at 16:33

PHP Guru

1,301
11
20

preg_replace doesn't accept the `g` argument as it's implied – Martin Mar 16 '21 at 16:34
Also your PCRE regex doesn't seem to actually match the examples given by the asker. – Martin Mar 16 '21 at 16:35
Sorry, but I don't think there's any guarentee any of those HTML tags will be present except the div tag... – Martin Mar 16 '21 at 16:39

PHP Strip all content around text

2 Answers2

Example:

Output:

UPDATE:

NOTE: