Hi I have a regex expression
<a href="(.+?)" class="nextpostslink">
This Regex works fine on the following html
'>
<span class='pages'>Page 1 of 12</span><span class='current'>1</span><a href='http://cinemassacre.com/category/avgn/page/2/' class='page larger'>2</a><a href='http://cinemassacre.com/category/avgn/page/3/' class='page larger'>3</a><a href='http://cinemassacre.com/category/avgn/page/4/' class='page larger'>4</a><a href='http://cinemassacre.com/category/avgn/page/5/' class='page larger'>5</a><a href="http://cinemassacre.com/category/avgn/page/2/" class="nextpostslink">»</a><span class='extend'>...</span><a href='http://cinemassacre.com/category/avgn/page/12/' class='last'>Last »</a>
</div> </div>
The part I am trying to extract is the next page url from
<a href="http://cinemassacre.com/category/avgn/page/2/" class="nextpostslink">
But when I run this regex on this block of HTML
'>
<span class='pages'>Page 2 of 12</span><a href="http://cinemassacre.com/category/avgn/" class="previouspostslink">«</a><a href='http://cinemassacre.com/category/avgn/' class='page smaller'>1</a><span class='current'>2</span><a href='http://cinemassacre.com/category/avgn/page/3/' class='page larger'>3</a><a href='http://cinemassacre.com/category/avgn/page/4/' class='page larger'>4</a><a href='http://cinemassacre.com/category/avgn/page/5/' class='page larger'>5</a><a href="http://cinemassacre.com/category/avgn/page/3/" class="nextpostslink">»</a><span class='extend'>...</span><a href='http://cinemassacre.com/category/avgn/page/12/' class='last'>Last »</a>
</div>
</div>
It extracts everything from the first <a href="
to " class="nextpostslink">
Why does this happen? I thought (.+?) was non greedy, so it should extract the minimal amount.
Which should be <a href="http://cinemassacre.com/category/avgn/page/3/" class="nextpostslink">
The complete python code im using is
match=re.compile('<a href="(.+?)" class="nextpostslink">', re.DOTALL).findall(pagenav)