0

I have several html files with same <section> but different content.

I would like to know if it is possible for me to remove these sections in multiple files using the sublime text

Exemple:

<section class="all-classes" id="section1">
     content 
</section>
<section class="all-classes" id="section2-do-not-remove-section">
     content 
</section>
<section class="all-classes" id="section3">
     content 
</section>
<section class="all-classes" id="section4">
     content 
</section>

in this example I would like to remove sections 1, 3 and 4 and keep section 2

hlguiw
  • 39
  • 5
  • You want to delete all sections where the format of the id is `section#` where `#` is 1,2,3,4,5,6,7,8,9,10,11,... ? – Niel Godfrey Pablo Ponciano Sep 03 '21 at 04:02
  • @Ouroborus From [What topics can I ask about here?](https://stackoverflow.com/help/on-topic) in the [help], software questions are allowed if they cover *"[...] software tools commonly used by programmers".* Sublime Text, like Vim, Emacs, VSCode, etc., is a programming editor, and there are [tens of thousands of questions](https://stackoverflow.com/questions/tagged/vim+or+vi+or+emacs+or+visual-studio-code) about them on this site that are perfectly on-topic. Also, this is a programming question because the answer is to use an HTML parser. – MattDMo Sep 03 '21 at 11:31
  • This is actually a job for an HTML parser, not regex. See [this](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) for a good laugh, but also for some good answers explaining why regex is not the tool for this job. [`BeautifulSoup4`](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) or [`lxml`](https://lxml.de) are the tools of choice for Python, I don't know about other languages. – MattDMo Sep 03 '21 at 11:34

1 Answers1

0

As mentioned by MattDMo in the comments, an HTML parser would be your best option for this job.

For simple cases where you just need a quick find+replace, you may use this RegEx:

<section.*id="section[\d]*"[\s\S]*?<\/section>

See it in action here.

Where:

  • <section.* - Catch text that starts with the tag section e.g. <section class="all-classes"
  • id="section[\d]*" - Catch the ids where the name is section followed by a number e.g. id="section32"
  • [\s\S]*? - Catch all characters (whitespaces or not) in a non-greedy way. This is to prevent spanning across multiple sections.
  • <\/section> - Catch the closing tag </section>. Since this was captured in a non-greedy way, this will always be the closest </section> tag.

WARNING: If you have nested sections (a section within a section), this will not work. You have to use an HTML parser for that.