How to delete a specific tag using regex

Question

I'm trying to delete a ul tag and nested tags inside that ul.

<ul class="related">
<li><a href="/related-1.html" target="_blank">Related article</a></li>
<li><a href="/related-2.html" target="_blank">Related article 2</a></li>
</ul>

I just deleted the nested li inside the ul using this (I'm using php for this thing, so I pulled content from a db as $content)

$content = $rq['content'];  //here is the <ul class="related">... code
$content1 = preg_replace('~<ul[^>]*>\K(?:(?!<ul).)*(?=</ul>)~Uis', '', $content);   //it works here

So far I get the next string in $content1

<ul class="related"></ul>

So how do I delete this piece of remaining code using regex? I tried the similar pattern but did not get the results I am wanting.

$finalcontent = preg_replace('~<ul[^>]*>\K.*(?=</ul>)~Uis', '', $content1);

I have no idea why you would want to remove elements from HTML this way, there are so many easier options available. — Ryan Wilson, Feb 07 '18 at 19:32
If I understand you correctly, you're trying to get nothing from your string? So delete `
` and its contents? Just use `~
]*>.*
~iUs`. But as @RyanWilson said, regex isn't the best option for this. It's unclear what the expected output is. — ctwheels, Feb 07 '18 at 19:34
ok, let me explain myself XD. i have hundreds of files from a db, so this ul is inside every file and need to be replaced. i dont know if this is the best option, using php and deleting ul and then updating the same file with the final result, sorry guys — Jaciel Lv, Feb 07 '18 at 19:36
@JacielLv what format are the *hundreds of files from a db* in? By that I mean are they entirely HTML? — ctwheels, Feb 07 '18 at 19:40
for example, i can pull content from db and put it in $content1, $content1 contains HTML code, and inside this bunch of code, there is always a pattern (the ul class=""... in question). the entire HTML is inside a string(which i call $content), so i need to use regex to find this particular
and delete it or substitute it with a blank ''. im sorry for my explanation :( — Jaciel Lv, Feb 07 '18 at 19:51
Well in that case you should use an HTML parser. See [H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) for more info. Regex should only be used on a **limited, known** set of HTML — ctwheels, Feb 07 '18 at 19:52

ngj · Accepted Answer · 2018-02-07T21:59:47.313

The following may suit your purpose:

$content1 = '<p>Foo</p><ul class="related"></ul><p>Bar</p>';
$finalcontent = preg_replace('~<ul[^>]*>.*</ul>~Uis', '', $content1);
echo $finalcontent;

The preg_replace call should remove all occurrences of <ul...>...</ul> from $content1. For the given example content, it returns:

<p>Foo</p><p>Bar</p>

If you want the replacement to be more specific, e.g., in order to only remove occurrences of <ul class="related">...</ul> but not other types of <ul>...</ul>, you can make the regex more specific. For example:

$content1 = '<p>Foo</p><ul class="related"></ul><p>Bar</p><ul><li>Do not delete this one</li></ul>';
$finalcontent = preg_replace('~<ul class="related">.*</ul>~Uis', '', $content1);
echo $finalcontent;

For the given example, this would return:

<p>Foo</p><p>Bar</p><ul><li>Do not delete this one</li></ul>

How to delete a specific tag using regex

1 Answers1