I've been trying to build a simple scraper that would take a keyword, then go to Amazon and enter the keyword into the search box, then scrape the main results only.
The problem is that the Regex isn't working. I've tried many different ways, but it's still not working properly.
$url = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=dog+bed&x=0&y=0";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$return = curl_exec($ch);
curl_close($ch);
preg_match_all('(<div.*class="data">.*<div class="title">.*<a.*class="title".*href="(.*?)">(.*?)</a>)', $return, $matches);
var_dump($matches);
Now Amazon's HTML code looks like this:
<div class="title">
<a class="title" href="https://rads.stackoverflow.com/amzn/click/com/B00063KG7S" rel="nofollow noreferrer">Midwest 40236 36-By-23-Inch Quiet Time Bolster Pet Bed, Fleece</a>
<span class="ptBrand">by Midwest Homes for Pets</span>
<span class="bindingAndRelease">(Nov 30, 2006)</span>
</div>
I've tried to change the Regex a million different ways, but what I've learned over the past few months just isn't working, at all. Of course, if I just change it to href="(.*?)" - I get every link on there...but not when I add in the
Any advice would be appreciated!