0

I'm trying to delete a ul tag and nested tags inside that ul.

<ul class="related">
<li><a href="/related-1.html" target="_blank">Related article</a></li>
<li><a href="/related-2.html" target="_blank">Related article 2</a></li>
</ul>

I just deleted the nested li inside the ul using this (I'm using php for this thing, so I pulled content from a db as $content)

$content = $rq['content'];  //here is the <ul class="related">... code
$content1 = preg_replace('~<ul[^>]*>\K(?:(?!<ul).)*(?=</ul>)~Uis', '', $content);   //it works here

So far I get the next string in $content1

<ul class="related"></ul>

So how do I delete this piece of remaining code using regex? I tried the similar pattern but did not get the results I am wanting.

$finalcontent = preg_replace('~<ul[^>]*>\K.*(?=</ul>)~Uis', '', $content1);
Scath
  • 3,777
  • 10
  • 29
  • 40
Jaciel Lv
  • 33
  • 1
  • 6
  • 4
    I have no idea why you would want to remove elements from HTML this way, there are so many easier options available. – Ryan Wilson Feb 07 '18 at 19:32
  • 1
    If I understand you correctly, you're trying to get nothing from your string? So delete `
      ` and its contents? Just use `~
        ]*>.*
      ~iUs`. But as @RyanWilson said, regex isn't the best option for this. It's unclear what the expected output is.
    – ctwheels Feb 07 '18 at 19:34
  • ok, let me explain myself XD. i have hundreds of files from a db, so this ul is inside every file and need to be replaced. i dont know if this is the best option, using php and deleting ul and then updating the same file with the final result, sorry guys – Jaciel Lv Feb 07 '18 at 19:36
  • @JacielLv what format are the *hundreds of files from a db* in? By that I mean are they entirely HTML? – ctwheels Feb 07 '18 at 19:40
  • for example, i can pull content from db and put it in $content1, $content1 contains HTML code, and inside this bunch of code, there is always a pattern (the ul class=""... in question). the entire HTML is inside a string(which i call $content), so i need to use regex to find this particular and delete it or substitute it with a blank ''. im sorry for my explanation :( – Jaciel Lv Feb 07 '18 at 19:51
  • 1
    Well in that case you should use an HTML parser. See [H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) for more info. Regex should only be used on a **limited, known** set of HTML – ctwheels Feb 07 '18 at 19:52

1 Answers1

1

The following may suit your purpose:

$content1 = '<p>Foo</p><ul class="related"></ul><p>Bar</p>';
$finalcontent = preg_replace('~<ul[^>]*>.*</ul>~Uis', '', $content1);
echo $finalcontent;

The preg_replace call should remove all occurrences of <ul...>...</ul> from $content1. For the given example content, it returns:

<p>Foo</p><p>Bar</p>

If you want the replacement to be more specific, e.g., in order to only remove occurrences of <ul class="related">...</ul> but not other types of <ul>...</ul>, you can make the regex more specific. For example:

$content1 = '<p>Foo</p><ul class="related"></ul><p>Bar</p><ul><li>Do not delete this one</li></ul>';
$finalcontent = preg_replace('~<ul class="related">.*</ul>~Uis', '', $content1);
echo $finalcontent;

For the given example, this would return:

<p>Foo</p><p>Bar</p><ul><li>Do not delete this one</li></ul>
ngj
  • 883
  • 7
  • 17