I'm trying to extract certain URLs from HTML (for example, all that begin with http, contain /tempfiles/ and end in .jpg). I have something like;
http.*?\/tempfiles\/.*?\.jpg
The problem is when I have HTML like;
blah blah <img src=http://somelink/file.html>http://server/tempfiles/blah.jpg
blah blah
It returns http://somelink/file.html etc
more junk http://server/tempfiles/blah.jpg
Is there a way to say there must not be a second http between the first and the /tempfiles/?