1

Possible Duplicate:
Preg_match_all <a href
How to parse and process HTML with PHP?

I have used curl to extract the source of a page and need to extract some values from the curl output.

Part of the output looks like this:

<div class="detailInfo">
<label>Manufacturer code/Gas council no:
                </label>BKSWX5506</div>
<div class="detailInfo"></div>
<div class="detailInfo">
<div>
<label>Retail price:</label><span>£12.30</span>
</div>
<div>
<label>Net buying price:</label><span>£7.47</span>
</div>
</div>

From that output I need to get the code after "Manufacturer code/Gas council no:" and both of the prices all in separate strings.

Can anyone help me with this?

Thanks :)

Community
  • 1
  • 1
Stuart Taylor-Jones
  • 210
  • 1
  • 4
  • 13
  • 1
    Checkout [`DOMDocument`](http://php.net/DOMDocument), it can get you all values you want from HTML documents, it's a superb tool! – hakre May 03 '12 at 09:23
  • I have looked at DOMDocument, but I need a quick solution and am not familiar with that and would rather use preg_match or similar – Stuart Taylor-Jones May 03 '12 at 09:27
  • Looks like it's this xpath: `$str = $xpath->evaluate('string(//div[@class="detailInfo"]/label)');`; - you can't get it quickier than with xpath, regex is creating problems especially if you're not used to it (and if you're you need more code for the same). – hakre May 03 '12 at 09:28
  • DOMDocument is the best here. Is your code always the same? I mean STUCTURE? – s.webbandit May 03 '12 at 09:46
  • Yes the code/structure will always be the same. Can you give me an example of DOMDocument code? – Stuart Taylor-Jones May 03 '12 at 09:57
  • @StuartTaylor-Jones: Sure, `list(, $number, , $retail, , $net) = array_map('trim', simplexml_import_dom(DOMDocument::loadHTML($html))->xpath('//text()[normalize-space(.)]'));` (for the HTML you've provided). – hakre May 03 '12 at 11:44

2 Answers2

1

try this :

<?php

        $output = '<div class="detailInfo">
<label>Manufacturer code/Gas council no:
                </label>BKSWX5506</div>
<div class="detailInfo"></div>
<div class="detailInfo">
<div>
<label>Retail price:</label><span>£12.30</span>
</div>
<div>
<label>Net buying price:</label><span>£7.47</span>
</div>
</div>';



        $outputArray = explode("</label>", str_replace("<label>","</label>",strip_tags($output, '<label>')));

        echo "<pre>";
        print_r($outputArray);
        echo "</pre>";
        exit;
?>

output :

Array
(
    [0] => 

    [1] => Manufacturer code/Gas council no:

    [2] => BKSWX5506




    [3] => Retail price:
    [4] => £12.30



    [5] => Net buying price:
    [6] => £7.47


)
Chintan
  • 1,204
  • 1
  • 8
  • 22
0

The following is a general purpose routine that you can use to get some xpath's to text parts you're looking for. It should give you a first start as it shows as well how to run a xpath query:

$searches = array('BKSWX5506', '£12.30', '£7.47');

$doc = new DOMDocument();
$doc->loadHTML('<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">'.$html);
$xp = new DOMXPath($doc);

foreach($searches as $search)
{
    $expression = '//text()[contains(., "'.$search.'")]';
    $result = $xp->query($expression);
    foreach($result as $found)
    {
        /* @var $found DOMNode */
        printf("%s: %s\n", $found->getNodePath(), $found->nodeValue);
    }
}

For the $html content you've provided it does the following output:

/html/body/div[1]/text()[2]: BKSWX5506
/html/body/div[3]/div[1]/span/text(): £12.30
/html/body/div[3]/div[2]/span/text(): £7.47

Using these paths would reveal the information again:

$number = $xp->evaluate('string(/html/body/div[1]/text()[2])'); # BKSWX5506

As you can see, you can xpath for both: analyzing documents to obtain specific values and then use the information gathered as a pattern.

hakre
  • 193,403
  • 52
  • 435
  • 836