0

the following class I would like to reference using simple html dom

but there is 2 classes one

is

class="price"

the other appears to be class=" price"

using this code does not appear to find it

foreach ($html1->find('[class= price ]/text()',0) as $price_data2)

the source for the page in question is here

http://www.amazon.com/Likeable-Social-Media-Irresistible-ebook/dp/B00511ONPG/ref=tmm_kin_title_0?ie=UTF8&qid=1367741120&sr=8-1

Andrew Barber
  • 39,603
  • 20
  • 94
  • 123
  • Try `$html1->find('.price')` instead. Then fetch the text-value from the result. If this does not help, take a DOM Level4 compatible HTML library that offers [`DOMDocument::getElementsByClassName`](http://www.w3.org/TR/domcore/#dom-document-getelementsbyclassname). – hakre May 05 '13 at 08:41
  • this just grabs the class="price" and not the one with 2 spaces – user2349095 May 05 '13 at 08:45
  • 1
    Sure, because by definition the class attribute is a space-separated list of classnames. This is explained in the HTML and CSS documentation. You normally want to make use of that when scraping HTML, so I dunno why this is an issue for you. You might want to extend the condition. Otherwise if you want to filter by the exact string-value of the argument, find all tags that have a class attribute and then filter against that string value. – hakre May 05 '13 at 08:45
  • You can also just use a library that parses HTML *and* offers XPath. In Xpath it is easy to search of a tag with an attribute containing an exact value. You might find some inspiration in [*"How to parse and process HTML/XML?"*](http://stackoverflow.com/q/3577641/367456) [PHP Reference Material] – hakre May 05 '13 at 08:50
  • A space in front of a class name should not be an issue. ` class` should be reduced to `class` automatically. – Pekka May 05 '13 at 08:54
  • possible duplicate of [XPath: How to match attributes that contain a certain string](http://stackoverflow.com/questions/1390568/xpath-how-to-match-attributes-that-contain-a-certain-string) – Ja͢ck May 06 '13 at 04:21

2 Answers2

0

An example with DOMDocument querying the class attribute value verbatim (with spaces around):

// configuration
libxml_use_internal_errors(true);

// input
$url = 'http://www.amazon.com/Likeable-Social-Media-Irresistible-ebook/dp/B00511ONPG/ref=tmm_kin_title_0?ie=UTF8&qid=1367741120&sr=8-1';

// processing
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXPath($doc);
$prices  = $xpath->query("//*[@class=' price ']/text()");

// output
foreach($prices as $index => $price) {
    printf("%d: %s\n", $index, trim($price->textContent));
}

Output:

0: $14.81
1: $18.38
2: $11.58
3: --
4: 
5: 

Please note that the URL you gave contains invalid HTML. Therefore the simpledom parser might produce different results (or does not work at all) with the data provided. This is equally true for the DOMDocument object I use here, however, it is build on top of the pretty stable libxml library (not only used in the PHP world, but in very many other worlds as well) and it also has a recovery property which allows further control.

hakre
  • 193,403
  • 52
  • 435
  • 836
0

You should be able to use:

$html->find('*[class*=price]/text()')

I don't like that /text() though because it's not real css.

Also note that you need to leave out the ,0 when iterating with foreach.

pguardiario
  • 53,827
  • 19
  • 119
  • 159