EDIT: What I'm looking for below is a REGEX statement that says something like this:
- Grab lines beginning with div class='productBundle' ending with .html.
- Grab all of them (I think this is called greedy)
I'll store these in an array, then I'll fetch the pages. For each page, I'll then need to grab the image url, so I'll need the regex code for that. I know it's brittle, but it'll get the job done for what I need.
I have a page of html, with groups of the following:
<div class='productBundle' id='4086472'>
<table cellpadding="0" cellspacing="0" class='inv'>
<tr><td valign="middle" align="center" width="100%">
<a href="http://listing.com/product/view/4086794.html" alt="472">
I'd like to retrieve all the urls listed under the div class='productBundle'. There could be any number per page, but always under the productBundle div.
Then from those html pages, I need to get product image url
<img id=productImage' src='http://listing.com/item/472248/472.jpg'>
For example, I need "http://listing.com/item/472248/472.jpg" from the html code above.
I could use the help with the REGEX code to grab the pages in the first part, then the REGEX code to grab the url from the productImage.
Thanks