1

I have a specific HTML source file that I need to be able to scan and parse and I am having trouble. While I understand that HTML aren't supposed to be part, this is part of the assignment, so I do not have any choice.

So far the regex I have are,

<[^/!].*?> for start tags

I have other regex for end tag and comment which works fine, but I cannot seem to type them here.

I am trouble coming up with a regex to detect all texts in between tags or of the body.

I would greatly appreciate any help possible.

Manfred Radlwimmer
  • 13,257
  • 13
  • 53
  • 62
Kyu Oh
  • 13
  • 2
  • 1
    Your requirement is unclear, but in any case regex isn't the best choice for your problem, especially if you expect to have nested HTML tags. Instead, look into using an HTML _parser_. – Tim Biegeleisen May 09 '17 at 05:14
  • Relevant—https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – miqh May 09 '17 at 05:15
  • This is exactly what a **parser** was made for. – Jan May 09 '17 at 05:23
  • 2
    Possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Jan May 09 '17 at 05:23
  • show your html too – JYoThI May 09 '17 at 05:35
  • The printout you submit should include answers to the above questions and the console transcript of running the default ant build, which will apply your grammar to the three input files.
  • Be sure to commit and push your code, as we will be looking at it as well as the printout.
  • – Kyu Oh May 09 '17 at 05:38
  • Above is part of the html I need to scan and parse, basically I am having trouble using regex to detect content other than the tags. – Kyu Oh May 09 '17 at 05:38
  • The printout you submit should include answers to the above questions and the console transcript of running the default ant build, which will apply your grammar to the three input files. (I need to be able to detect this) – Kyu Oh May 09 '17 at 05:39
  • try this one
  • (.*?)
  • – JYoThI May 09 '17 at 05:51