I have the following HTML code:
<div id="page126-div" style="position:relative;width:918px;height:1188px;">
</div>
<div id="page127-div" style="position:relative;width:918px;height:1188px;">
sometext for example
</div>
<div id="page128-div" style="position:relative;width:918px;height:1188px;">
</div>
My task is to match empty divs. Empty means in this context that they do not content at all (no characters between open > and closing <) or contain just newline, or just a space or newline or less than 5 characters. So emptyness is pretty fuzzy.
If I would match all divs, not only empty I would use the following regex:
\<div id="page.*?"\>.*?\<\/div\>
Naturally I should use it with dotall modifier.
But when I try to match only empty divs I try to use this expression:
\<div id="page.*?"\>.{0,5}?\<\/div\>
I expect to get first and last(third) divs, because they contain: opening div tag with attributes, then div content that can be from 0 to 5 characters and closing div tag. First match is right, but second match is second and third divs stacked together instead of third div only. I do not understand why.