I am trying to learn Regex patterns for a class. I am making a simple HTML Lexer/Parser. I know this is not the best or most efficient way to make a Lexer/Parser but it is only to understand Regex patterns.
So my question is, How do I create a pattern that checks if the String does not contain any HTML tags (ie <TAG>
) and does not contain any HTML Entities (ie &ENT;
)?
This is what I could come up with so far but it still does not work:
.+?(^(?:&[A-Za-z0-9#]+;)^(?:<.*?>))
EDIT: The only problem is that I can't negate the final outcome I need to find a complete pattern that would accomplish this task if it's possible, although it might not be pretty. I never mentioned but it's pretty much supposed to match any Simple Text in an HTML page.