I want to extract the first paragraph of an article using RegEx and PHP. I started to write a RegEx as below:
'/<p([^>]+)>(.*)<\/p>/i'
That's doing the job but the only little bug is that while markup is minified and in a one line as below:
<p>First Paragraph</p><p>SecondParagraph</p>
It simply matches all <p>First Paragraph</p><p>SecondParagraph</p>
.
Also, I know that a paragraph could not be inside another one but I have no control on what user writes so he may do something like this and the RegEx would return unexpected result in this case as below:
<p>
First Paragraph
<p>SecondParagraph</p>
</p>
Now the RegEx matches <p>First Paragraph<p>SecondParagraph</p>
but should extract <p>First Paragraph<p>SecondParagraph</p></p>
.