0

I attempted to research and write this expression myself, but alas, I gave up. I could never get a hold of RegEx. :) Ultimately, I want to clean up a large batch of HTML files. I would need two sets of expressions:

  1. I want to select code that starts with <!DOCTYPE and ends with <div id="content">
  2. I also want to also select code that starts with </div><!-- end content --> and ends with </html>

How would you write out these expressions?

EdwardM
  • 1,116
  • 11
  • 20
  • 2
    Possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Kimball Robinson Jan 14 '16 at 18:06
  • You should not use regular expressions to parse HTML. See this answer: http://stackoverflow.com/a/1732454/116810 – Kimball Robinson Jan 14 '16 at 18:06
  • Cool, thanks Kimball, I wasn't sure what to search for, I will look into those posts – EdwardM Jan 14 '16 at 18:07
  • Why are you cleaning up HTML? Are you trying to tidy it up? Are you trying to remove unsafe tags, or something? There are tools out there to do that. But you need to edit your question to explain *why* you are doing what you are doing, without the assumption of using RegExes. – Kimball Robinson Jan 14 '16 at 18:10
  • @KimballRobinson I think the term you're looking for is "XY Problem". – erip Jan 14 '16 at 18:12
  • @Kimball - ok, to be more specific. I am taking HTML pages for a client and I want to remove these lines of code so that we can easily paste the remaining "body" content back into a Content Management System. Hopefully that clears it up. If there are tools, can you name some for me? I am using Dreamweaver and Visual Studio at the moment. But I also have Brackets and Atom installed as well. Perhaps there are plug-ins for those? – EdwardM Jan 14 '16 at 18:14
  • So, are you trying to use your text editor's regex (find/replace) tools to pull out the central content? If so, which editor(s) are you trying? Or are you writing a program? If so, what language(s)? – Kimball Robinson Jan 14 '16 at 18:16
  • I was simply attempting to use the Find and Replace tools in VS or Dreamweaver, checking the Use RegEx option, but I guess I should not be doing that? – EdwardM Jan 14 '16 at 18:17
  • @edwardm I suppose you can use Regex tools for this, if you have a small set of files. If you know how to write a program, I would lean toward doing that. However, I am not familiar with the VS and dreamweaver toolkits, so I am not sure what dialect of regular expressions they use. I suggest you add those to the question tags, though. – Kimball Robinson Jan 14 '16 at 18:20
  • Ok, gotcha. Didn't know there were different dialects of regex :) – EdwardM Jan 14 '16 at 18:21
  • The IDE I use lets me highlight a matching block of code. Looks like VS does this too: http://dailydotnettips.com/2013/08/19/how-to-select-a-block-of-code-in-visual-studio/ – Kimball Robinson Jan 14 '16 at 18:24

1 Answers1

0

Without using VS, I would guess you can select a code block (eg matching braces or html tags) by

Kimball Robinson
  • 3,287
  • 9
  • 47
  • 59