I'm trying to write a regex that will find a string of HTML tags inside a code editor (Khan Live Editor) and give the following error:
"You can't put <h1.. 2.. 3..> inside <p> elements."
This is the string I'm trying to match:
<p> ... <h1>
This the string I don't want to match:
<p> ... </p><h1>
Instead the expected behavior is that another error message appears in this situation.
So in English I want a string that;
- starts with <p>
and
- ends with <h1>
but
- does not contain </p>
.
It's easy enough to make this work if I don't care about the existence of a </p>
. My expression looks like this, /<p>.*<h[1-6]>/
and it works fine. But I need to make sure that </p>
does not come between the <p>
and <h1>
tags (or any <h#>
tag, hence the <h[1-6]>
).
I've tried a lot of different expressions from some other posts on here:
Regular expression to match a line that doesn't contain a word?
From which I tried: <p>^((?!<\/p>).)*$</h1>
regex string does not contain substring
From which I tried: /^<p>(?!<\/p>)<h1>$/
Regular expression that doesn't contain certain string
This link suggested: aa([^a] | a[^a])aa
Which doesn't work in my case because I need the specific string "</p>
" not just the characters of it since there might be other tags between <p> ... <h1>
.
I'm really stumped here. The regex I've tried seems like it should work... Any idea how I would make this work? Maybe I'm implementing the suggestions from other posts wrong?
Thanks in advance for any help.
Edit:
To answer why I need this done:
The problem is that <p><h1></h1></p>
is a syntax error since h1
closes the first <p>
and there is an unmatched </p>
. The original syntax error is not informative, but in most cases it is correct; my example being the exception. I'm trying to pass the syntax parser a new message to override the original message if the regex finds this exception.
` and there is an unmatched `
`. The original syntax error is not informative, but in most cases it is correct; my example being the exception. I'm trying to pass the syntax parser a new message to override the original message if the regex finds this exception. – Dan Fletcher Nov 24 '15 at 18:53,
, etc before an explicit
as, in HTML5 (which has this flow-content rule) the is completely optional. For instance: `Paragraph 1.
Paragraph 2.
Heading
Paragraph 3.` Is completely valid HTML5 and can be authored as such intentionally.
– rgthree Nov 24 '15 at 18:57`.
– Dan Fletcher Nov 24 '15 at 19:06element is immediately followed by..."_. The key there is the "
element" which means _the entire paragraph_ and not the opening p tag. Essentially, the authored content `
Paragraph 2.
Heading
` is valid as the paragraph is intentionally ending at of `Paragraph 2.` when the `` element intentionally ends the
element and starts a new flow content block.
– rgthree Nov 24 '15 at 19:07para
head
para2end` which I think you want to match (as an error), and `test2para
head
para2
end` which you would not want to match? Are these possibilities? – Alan McBee Nov 24 '15 at 19:17