Possible Duplicate:
Regular expression for parsing links from a webpage?
How can I find all urls from HTML using regular expression. I need only url for pages so I want to add exclusion of urls which end with ".css" or ".jpg" or ".js" etc.
Example of HTML:
<a href=index.php?option=content&task=view&id=2&Itemid=25 class="menu_selected" id="">Home</a>
or
<a href="http://data.stackexchange.com">data</a> |
<a href="http://shop.stackexchange.com/">shop</a> |
<a href="http://stackexchange.com/legal">legal</a> |
Thanks