I'm trying to write an Python parser to extract some information from html-pages.
It should extract text from between <p itemprop="xxx">
and </p>
I use regular expression:
m = re.search(ur'p>(?P<text>[^<]*)</p>', html)
but it can't parse file if it is another tags between them. For example:
<p itemprop="xxx"> some text <br/> another text </p>
As I understood [^<]
is exception only for one symbol. How to write "everything except </p>
" ?