0

I'm using simpleHtmlDom to do some basic screen scraping. I'm having some problems with grabbing product prices though. Sometimes I can get it to work, sometimes I can't. Also, sometimes I'm getting multiple prices... say for example the website has something like "normally $100... now $79.99" Any suggestions out there? Currently, I'm using this:

$prices = array();
$prices[] = $html->find("[class*=price]", 0)->innertext;
$prices[] = $html->find("[class*=msrp]", 0)->innertext;
$prices[] = $html->find("[id*=price]", 0)->innertext;
$prices[] = $html->find("[id*=msrp]", 0)->innertext;
$prices[] = $html->find("[name*=price]", 0)->innertext;
$prices[] = $html->find("[name*=msrp]", 0)->innertext;

One website that I have no idea of how to grab the price from is Victoria Secret.... the price looks like it's just floating around in random HTML.

Stanley
  • 559
  • 2
  • 5
  • 11
  • do you have any particular question? We cannot come up with a one size fits all solution for any possible markup out there. Have a look at http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-with-php for some tips about parsing HTML with PHP. – Gordon Dec 07 '11 at 15:26
  • I'm looking to see what other methods people are using to grab product prices as well as to grab the correct prices. I realize that there isn't a "single solution" to this, but there must be something better than what I'm currently doing. – Stanley Dec 07 '11 at 15:45

1 Answers1

1

First of all, don't use simplehtmldom. Use the built in dom functions or a library that's based on them. If you want to extract all prices from a page you could try something like this:

$html = "<html><body>normally $100... now $79.99</body></html>";
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DomXpath($dom);

foreach($xpath->query('//text()[contains(.,"$")]') as $node){
    preg_match_all('/(\$[\d,.]+)/', $node->nodeValue, $m);
    print_r($m);
}
pguardiario
  • 53,827
  • 19
  • 119
  • 159